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Applied  mathematics  rests  on  two  central  pillars:  calculus  and  linear  algebra.  While  cal¬ 
culus  has  its  roots  in  the  universal  laws  of  Newtonian  physics,  linear  algebra  arises  from  a 
much  more  mundane  issue:  the  need  to  solve  simple  systems  of  linear  algebraic  equations. 
Despite  its  humble  origins,  linear  algebra  ends  up  playing  a  comparably  profound  role  in 
both  applied  and  theoretical  mathematics,  as  well  as  in  all  of  science  and  engineering, 
including  computer  science,  data  analysis  and  machine  learning,  imaging  and  signal  pro¬ 
cessing,  probability  and  statistics,  economics,  numerical  analysis,  mathematical  biology, 
and  many  other  disciplines.  Nowadays,  a  proper  grounding  in  both  calculus  and  linear  al¬ 
gebra  is  an  essential  prerequisite  for  a  successful  career  in  science,  technology,  engineering, 
statistics,  data  science,  and,  of  course,  mathematics. 

Since  Newton,  and,  to  an  even  greater  extent  following  Einstein,  modern  science  has 
been  confronted  with  the  inherent  nonlinearity  of  the  macroscopic  universe.  But  most  of 
our  insight  and  progress  is  based  on  linear  approximations.  Moreover,  at  the  atomic  level, 
quantum  mechanics  remains  an  inherently  linear  theory.  (The  complete  reconciliation 
of  linear  quantum  theory  with  the  nonlinear  relativistic  universe  remains  the  holy  grail 
of  modern  physics.)  Only  with  the  advent  of  large-scale  computers  have  we  been  able 
to  begin  to  investigate  the  full  complexity  of  natural  phenomena.  But  computers  rely 
on  numerical  algorithms,  and  these  in  turn  require  manipulating  and  solving  systems  of 
algebraic  equations.  Now,  rather  than  just  a  handful  of  equations,  we  may  be  confronted 
by  gigantic  systems  containing  thousands  (or  even  millions)  of  unknowns.  Without  the 
discipline  of  linear  algebra  to  formulate  systematic,  efficient  solution  algorithms,  as  well 
as  the  consequent  insight  into  how  to  proceed  when  the  numerical  solution  is  insufficiently 
accurate,  we  would  be  unable  to  make  progress  in  the  linear  regime,  let  alone  make  sense 
of  the  truly  nonlinear  physical  universe. 

Linear  algebra  can  thus  be  viewed  as  the  mathematical  apparatus  needed  to  solve  po¬ 
tentially  huge  linear  systems,  to  understand  their  underlying  structure,  and  to  apply  what 
is  learned  in  other  contexts.  The  term  “linear”  is  the  key,  and,  in  fact,  it  refers  not  just 
to  linear  algebraic  equations,  but  also  to  linear  differential  equations,  both  ordinary  and 
partial,  linear  boundary  value  problems,  linear  integral  equations,  linear  iterative  systems, 
linear  control  systems,  and  so  on.  It  is  a  profound  truth  that,  while  outwardly  different, 
all  linear  systems  are  remarkably  similar  at  their  core.  Basic  mathematical  principles  such 
as  linear  superposition,  the  interplay  between  homogeneous  and  inhomogeneous  systems, 
the  Fredholm  alternative  characterizing  solvability,  orthogonality,  positive  definiteness  and 
minimization  principles,  eigenvalues  and  singular  values,  and  linear  iteration,  to  name  but 
a  few,  reoccur  in  surprisingly  many  ostensibly  unrelated  contexts. 

In  the  late  nineteenth  and  early  twentieth  centuries,  mathematicians  came  to  the  real¬ 
ization  that  all  of  these  disparate  techniques  could  be  subsumed  in  the  edifice  now  known 
as  linear  algebra.  Understanding,  and,  more  importantly,  exploiting  the  apparent  simi¬ 
larities  between,  say,  algebraic  equations  and  differential  equations,  requires  us  to  become 
more  sophisticated  —  that  is,  more  abstract  —  in  our  mode  of  thinking.  The  abstraction 
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process  distills  the  essence  of  the  problem  away  from  all  its  distracting  particularities,  and, 
seen  in  this  light,  all  linear  systems  rest  on  a  common  mathematical  framework.  Don’t  be 
afraid!  Abstraction  is  not  new  in  your  mathematical  education.  In  elementary  algebra, 
you  already  learned  to  deal  with  variables,  which  are  the  abstraction  of  numbers.  Later, 
the  abstract  concept  of  a  function  formalized  particular  relations  between  variables,  say 
distance,  velocity,  and  time,  or  mass,  acceleration,  and  force.  In  linear  algebra,  the  abstrac¬ 
tion  is  raised  to  yet  a  further  level,  in  that  one  views  apparently  different  types  of  objects 
(vectors,  matrices,  functions,  . . . )  and  systems  (algebraic,  differential,  integral,  . . . )  in  a 
common  conceptual  framework.  (And  this  is  by  no  means  the  end  of  the  mathematical 
abstraction  process;  modern  category  theory,  [37],  abstractly  unites  different  conceptual 
frameworks.) 

In  applied  mathematics,  we  do  not  introduce  abstraction  for  its  intrinsic  beauty.  Our 
ultimate  purpose  is  to  develop  effective  methods  and  algorithms  for  applications  in  science, 
engineering,  computing,  statistics,  data  science,  etc.  For  us,  abstraction  is  driven  by  the 
need  for  understanding  and  insight,  and  is  justified  only  if  it  aids  in  the  solution  to  real 
world  problems  and  the  development  of  analytical  and  computational  tools.  Whereas  to  the 
beginning  student  the  initial  concepts  may  seem  designed  merely  to  bewilder  and  confuse, 
one  must  reserve  judgment  until  genuine  applications  appear.  Patience  and  perseverance 
are  vital.  Once  we  have  acquired  some  familiarity  with  basic  linear  algebra,  significant, 
interesting  applications  will  be  readily  forthcoming.  In  this  text,  we  encounter  graph  theory 
and  networks,  mechanical  structures,  electrical  circuits,  quantum  mechanics,  the  geometry 
underlying  computer  graphics  and  animation,  signal  and  image  processing,  interpolation 
and  approximation,  dynamical  systems  modeled  by  linear  differential  equations,  vibrations, 
resonance,  and  damping,  probability  and  stochastic  processes,  statistics,  data  analysis, 
splines  and  modern  font  design,  and  a  range  of  powerful  numerical  solution  algorithms,  to 
name  a  few.  Further  applications  of  the  material  you  learn  here  will  appear  throughout 
your  mathematical  and  scientific  career. 

This  textbook  has  two  interrelated  pedagogical  goals.  The  first  is  to  explain  basic 
techniques  that  are  used  in  modern,  real-world  problems.  But  we  have  not  written  a  mere 
mathematical  cookbook  —  a  collection  of  linear  algebraic  recipes  and  algorithms.  We 
believe  that  it  is  important  for  the  applied  mathematician,  as  well  as  the  scientist  and 
engineer,  not  just  to  learn  mathematical  techniques  and  how  to  apply  them  in  a  variety 
of  settings,  but,  even  more  importantly,  to  understand  why  they  work  and  how  they  are 
derived  from  first  principles.  In  our  approach,  applications  go  hand  in  hand  with  theory, 
each  reinforcing  and  inspiring  the  other.  To  this  end,  we  try  to  lead  the  reader  through  the 
reasoning  that  leads  to  the  important  results.  We  do  not  shy  away  from  stating  theorems 
and  writing  out  proofs,  particularly  when  they  lead  to  insight  into  the  methods  and  their 
range  of  applicability.  We  hope  to  spark  that  eureka  moment,  when  you  realize  “Yes, 
of  course!  I  could  have  come  up  with  that  if  I’d  only  sat  down  and  thought  it  out.” 
Most  concepts  in  linear  algebra  are  not  all  that  difficult  at  their  core,  and,  by  grasping 
their  essence,  not  only  will  you  know  how  to  apply  them  in  routine  contexts,  you  will 
understand  what  may  be  required  to  adapt  to  unusual  or  recalcitrant  problems.  And,  the 
further  you  go  on  in  your  studies  or  work,  the  more  you  realize  that  very  few  real-world 
problems  fit  neatly  into  the  idealized  framework  outlined  in  a  textbook.  So  it  is  (applied) 
mathematical  reasoning  and  not  mere  linear  algebraic  technique  that  is  the  core  and  raison 
d’etre  of  this  text! 

Applied  mathematics  can  be  broadly  divided  into  three  mutually  reinforcing  compo¬ 
nents.  The  first  is  modeling  —  how  one  derives  the  governing  equations  from  physical 
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principles.  The  second  is  solution  techniques  and  algorithms  —  methods  for  solving  the 
model  equations.  The  third,  perhaps  least  appreciated  but  in  many  ways  most  important, 
are  the  frameworks  that  incorporate  disparate  analytical  methods  into  a  few  broad  themes. 
The  key  paradigms  of  applied  linear  algebra  to  be  covered  in  this  text  include 

•  Gaussian  Elimination  and  factorization  of  matrices; 

•  linearity  and  linear  superposition; 

•  span,  linear  independence,  basis,  and  dimension; 

•  inner  products,  norms,  and  inequalities; 

•  compatibility  of  linear  systems  via  the  Fredholm  alternative; 

•  positive  definiteness  and  minimization  principles; 

•  orthonormality  and  the  Gram-Schmidt  process; 

•  least  squares  solutions,  interpolation,  and  approximation; 

•  linear  functions  and  linear  and  affine  transformations; 

•  eigenvalues  and  eigenvectors/eigenfunctions; 

•  singular  values  and  principal  component  analysis; 

•  linear  iteration,  including  Markov  processes  and  numerical  solution  schemes; 

•  linear  systems  of  ordinary  differential  equations,  stability,  and  matrix  exponentials; 

•  vibrations,  quasi-periodicity,  damping,  and  resonance;  . 


These  are  all  interconnected  parts  of  a  very  general  applied  mathematical  edifice  of  remark¬ 
able  power  and  practicality.  Understanding  such  broad  themes  of  applied  mathematics  is 
our  overarching  objective.  Indeed,  this  book  began  life  as  a  part  of  a  much  larger  work, 
whose  goal  is  to  similarly  cover  the  full  range  of  modern  applied  mathematics,  both  lin¬ 
ear  and  nonlinear,  at  an  advanced  undergraduate  level.  The  second  installment  is  now  in 
print,  as  the  first  author’s  text  on  partial  differential  equations,  [61],  which  forms  a  nat¬ 
ural  extension  of  the  linear  analytical  methods  and  theoretical  framework  developed  here, 
now  in  the  context  of  the  equilibria  and  dynamics  of  continuous  media,  Fourier  analysis, 
and  so  on.  Our  inspirational  source  was  and  continues  to  be  the  visionary  texts  of  Gilbert 
Strang,  [79,  80].  Based  on  students’  reactions,  our  goal  has  been  to  present  a  more  linearly 
ordered  and  less  ambitious  development  of  the  subject,  while  retaining  the  excitement  and 
interconnectedness  of  theory  and  applications  that  is  evident  in  Strang’s  works. 


Syllabi  and  Prerequisites 


This  text  is  designed  for  three  potential  audiences: 

•  A  beginning,  in-depth  course  covering  the  fundamentals  of  linear  algebra  and  its  appli¬ 

cations  for  highly  motivated  and  mathematically  mature  students. 

•  A  second  undergraduate  course  in  linear  algebra,  with  an  emphasis  on  those  methods 

and  concepts  that  are  important  in  applications. 

•  A  beginning  graduate-level  course  in  linear  mathematics  for  students  in  engineering, 

physical  science,  computer  science,  numerical  analysuis,  statistics,  and  even  math¬ 
ematical  biology,  finance,  economics,  social  sciences,  and  elsewhere,  as  well  as 
master’s  students  in  applied  mathematics. 

Although  most  students  reading  this  book  will  have  already  encountered  some  basic 
linear  algebra  —  matrices,  vectors,  systems  of  linear  equations,  basic  solution  techniques, 
etc.  —  the  text  makes  no  such  assumptions.  Indeed,  the  first  chapter  starts  at  the  very 
beginning  by  introducing  linear  algebraic  systems,  matrices,  and  vectors,  followed  by  very 
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basic  Gaussian  Elimination.  We  do  assume  that  the  reader  has  taken  a  standard  two 
year  calculus  sequence.  One- variable  calculus  —  derivatives  and  integrals  —  will  be  used 
without  comment;  multivariable  calculus  will  appear  only  fleet ingly  and  in  an  inessential 
way.  The  ability  to  handle  scalar,  constant  coefficient  linear  ordinary  differential  equations 
is  also  assumed,  although  we  do  briefly  review  elementary  solution  techniques  in  Chapter  7. 
Proofs  by  induction  will  be  used  on  occasion.  But  the  most  essential  prerequisite  is  a 
certain  degree  of  mathematical  maturity  and  willingness  to  handle  the  increased  level  of 
abstraction  that  lies  at  the  heart  of  contemporary  linear  algebra. 

Survey  of  Topics 

In  addition  to  introducing  the  fundamentals  of  matrices,  vectors,  and  Gaussian  Elimination 
from  the  beginning,  the  initial  chapter  delves  into  perhaps  less  familiar  territory,  such  as 
the  (permuted)  LU  and  LDV  decompositions,  and  the  practical  numerical  issues  underly¬ 
ing  the  solution  algorithms,  thereby  highlighting  the  computational  efficiency  of  Gaussian 
Elimination  coupled  with  Back  Substitution  versus  methods  based  on  the  inverse  matrix 
or  determinants,  as  well  as  the  use  of  pivoting  to  mitigate  possibly  disastrous  effects  of 
numerical  round-off  errors.  Because  the  goal  is  to  learn  practical  algorithms  employed 
in  contemporary  applications,  matrix  inverses  and  determinants  are  de-emphasized 
indeed,  the  most  efficient  way  to  compute  a  determinant  is  via  Gaussian  Elimination, 
which  remains  the  key  algorithm  throughout  the  initial  chapters. 

Chapter  2  is  the  heart  of  linear  algebra,  and  a  successful  course  rests  on  the  students’ 
ability  to  assimilate  the  absolutely  essential  concepts  of  vector  space,  subspace,  span,  linear 
independence,  basis,  and  dimension.  While  these  ideas  may  well  have  been  encountered 
in  an  introductory  ordinary  differential  equation  course,  it  is  rare,  in  our  experience,  that 
students  at  this  level  are  at  all  comfortable  with  them.  The  underlying  mathematics  is  not 
particularly  difficult,  but  enabling  the  student  to  come  to  grips  with  a  new  level  of  abstrac¬ 
tion  remains  the  most  challenging  aspect  of  the  course.  To  this  end,  we  have  included  a 
wide  range  of  illustrative  examples.  Students  should  start  by  making  sure  they  understand 
how  a  concept  applies  to  vectors  in  Euclidean  space  Mn  before  pressing  on  to  less  famil¬ 
iar  territory.  While  one  could  design  a  course  that  completely  avoids  infinite-dimensional 
function  spaces,  we  maintain  that,  at  this  level,  they  should  be  integrated  into  the  subject 
right  from  the  start.  Indeed,  linear  analysis  and  applied  mathematics,  including  Fourier 
methods,  boundary  value  problems,  partial  differential  equations,  numerical  solution  tech¬ 
niques,  signal  processing,  control  theory,  modern  physics,  especially  quantum  mechanics, 
and  many,  many  other  fields,  both  pure  and  applied,  all  rely  on  basic  vector  space  con¬ 
structions,  and  so  learning  to  deal  with  the  full  range  of  examples  is  the  secret  to  future 
success.  Section  2.5  then  introduces  the  fundamental  subspaces  associated  with  a  matrix 
—  kernel  (null  space),  image  (column  space),  coimage  (row  space),  and  cokernel  (left  null 
space)  —  leading  to  what  is  known  as  the  Fundamental  Theorem  of  Linear  Algebra  which 
highlights  the  remarkable  interplay  between  a  matrix  and  its  transpose.  The  role  of  these 
spaces  in  the  characterization  of  solutions  to  linear  systems,  e.g.,  the  basic  superposition 
principles,  is  emphasized.  The  final  Section  2.6  covers  a  nice  application  to  graph  theory, 
in  preparation  for  later  developments. 

Chapter  3  discusses  general  inner  products  and  norms,  using  the  familiar  dot  product 
and  Euclidean  distance  as  motivational  examples.  Again,  we  develop  both  the  finite¬ 
dimensional  and  function  space  cases  in  tandem.  The  fundamental  Cauchy-Schwarz  in¬ 
equality  is  easily  derived  in  this  abstract  framework,  and  the  more  familiar  triangle  in- 


Preface 


xi 


equality,  for  norms  derived  from  inner  products,  is  a  simple  consequence.  This  leads  to 
the  definition  of  a  general  norm  and  the  induced  matrix  norm,  of  fundamental  importance 
in  iteration,  analysis,  and  numerical  methods.  The  classification  of  inner  products  on  Eu¬ 
clidean  space  leads  to  the  important  class  of  positive  definite  matrices.  Gram  matrices, 
constructed  out  of  inner  products  of  elements  of  inner  product  spaces,  are  a  particularly 
fruitful  source  of  positive  definite  and  semi-definite  matrices,  and  reappear  throughout  the 
text.  Tests  for  positive  definiteness  rely  on  Gaussian  Elimination  and  the  connections  be¬ 
tween  the  LDLt  factorization  of  symmetric  matrices  and  the  process  of  completing  the 
square  in  a  quadratic  form.  We  have  deferred  treating  complex  vector  spaces  until  the 
final  section  of  this  chapter  —  only  the  definition  of  an  inner  product  is  not  an  evident 
adaptation  of  its  real  counterpart. 

Chapter  4  exploits  the  many  advantages  of  orthogonality.  The  use  of  orthogonal  and 
orthonormal  bases  creates  a  dramatic  speed-up  in  basic  computational  algorithms.  Orthog¬ 
onal  matrices,  constructed  out  of  orthogonal  bases,  play  a  major  role,  both  in  geometry 
and  graphics,  where  they  represent  rigid  rotations  and  reflections,  as  well  as  in  notable 
numerical  algorithms.  The  orthogonality  of  the  fundamental  matrix  subspaces  leads  to  a 
linear  algebraic  version  of  the  Fredholm  alternative  for  compatibility  of  linear  systems.  We 
develop  several  versions  of  the  basic  Gram-Schmidt  process  for  converting  an  arbitrary 
basis  into  an  orthogonal  basis,  used  in  particular  to  construct  orthogonal  polynomials  and 
functions.  When  implemented  on  bases  of  Mn,  the  algorithm  becomes  the  celebrated  QR 
factorization  of  a  nonsingular  matrix.  The  final  section  surveys  an  important  application  to 
contemporary  signal  and  image  processing:  the  discrete  Fourier  representation  of  a  sampled 
signal,  culminating  in  the  justly  famous  Fast  Fourier  Transform. 

Chapter  5  is  devoted  to  solving  the  most  basic  multivariable  minimization  problem: 
a  quadratic  function  of  several  variables.  The  solution  is  reduced,  by  a  purely  algebraic 
computation,  to  a  linear  system,  and  then  solved  in  practice  by,  for  example,  Gaussian 
Elimination.  Applications  include  finding  the  closest  element  of  a  subspace  to  a  given 
point,  which  is  reinterpreted  as  the  orthogonal  projection  of  the  element  onto  the  subspace, 
and  results  in  the  least  squares  solution  to  an  incompatible  linear  system.  Interpolation 
of  data  points  by  polynomials,  trigonometric  function,  splines,  etc.,  and  least  squares  ap¬ 
proximation  of  discrete  data  and  continuous  functions  are  thereby  handled  in  a  common 
conceptual  framework. 

Chapter  6  covers  some  striking  applications  of  the  preceding  developments  in  mechanics 
and  electrical  circuits.  We  introduce  a  general  mathematical  structure  that  governs  a  wide 
range  of  equilibrium  problems.  To  illustrate,  we  start  with  simple  mass-spring  chains, 
followed  by  electrical  networks,  and  finish  by  analyzing  the  equilibrium  configurations  and 
the  stability  properties  of  general  structures.  Extensions  to  continuous  mechanical  and 
electrical  systems  governed  by  boundary  value  problems  for  ordinary  and  partial  differential 
equations  can  be  found  in  the  companion  text  [61]. 

Chapter  7  delves  into  the  general  abstract  foundations  of  linear  algebra,  and  includes 
significant  applications  to  geometry.  Matrices  are  now  viewed  as  a  particular  instance 
of  linear  functions  between  vector  spaces,  which  also  include  linear  differential  operators, 
linear  integral  operators,  quantum  mechanical  operators,  and  so  on.  Basic  facts  about  linear 
systems,  such  as  linear  superposition  and  the  connections  between  the  homogeneous  and 
inhomogeneous  systems,  which  were  already  established  in  the  algebraic  context,  are  shown 
to  be  of  completely  general  applicability.  Linear  functions  and  slightly  more  general  affine 
functions  on  Euclidean  space  represent  basic  geometrical  transformations  —  rotations, 
shears,  translations,  screw  motions,  etc.  —  and  so  play  an  essential  role  in  modern  computer 
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graphics,  movies,  animation,  gaming,  design,  elasticity,  crystallography,  symmetry,  etc. 
Further,  the  elementary  transpose  operation  on  matrices  is  viewed  as  a  particular  case 
of  the  adjoint  operation  on  linear  functions  between  inner  product  spaces,  leading  to  a 
general  theory  of  positive  definiteness  that  characterizes  solvable  quadratic  minimization 
problems,  with  far-reaching  consequences  for  modern  functional  analysis,  partial  differential 
equations,  and  the  calculus  of  variations,  all  fundamental  in  physics  and  mechanics. 

Chapters  8-10  are  concerned  with  eigenvalues  and  their  many  applications,  includ¬ 
ing  data  analysis,  numerical  methods,  and  linear  dynamical  systems,  both  continuous 
and  discrete.  After  motivating  the  fundamental  definition  of  eigenvalue  and  eigenvector 
through  the  quest  to  solve  linear  systems  of  ordinary  differential  equations,  the  remainder 
of  Chapter  8  develops  the  basic  theory  and  a  range  of  applications,  including  eigenvector 
bases,  diagonalization,  the  Schur  decomposition,  and  the  Jordan  canonical  form.  Practical 
computational  schemes  for  determining  eigenvalues  and  eigenvectors  are  postponed  until 
Chapter  9.  The  final  two  sections  cover  the  singular  value  decomposition  and  principal 
component  analysis,  of  fundamental  importance  in  modern  statistical  analysis  and  data 
science. 

Chapter  9  employs  eigenvalues  to  analyze  discrete  dynamics,  as  governed  by  linear  iter¬ 
ative  systems.  The  formulation  of  their  stability  properties  leads  us  to  define  the  spectral 
radius  and  further  develop  matrix  norms.  Section  9.3  contains  applications  to  Markov 
chains  arising  in  probabilistic  and  stochastic  processes.  We  then  discuss  practical  alter¬ 
natives  to  Gaussian  Elimination  for  solving  linear  systems,  including  the  iterative  Jacobi, 
Gauss-Seidel,  and  Successive  Over-Relaxation  (SOR)  schemes,  as  well  as  methods  for  com¬ 
puting  eigenvalues  and  eigenvectors  including  the  Power  Method  and  its  variants,  and  the 
striking  QR  algorithm,  including  a  new  proof  of  its  convergence.  Section  9.6  introduces 
more  recent  semi-direct  iterative  methods  based  on  Krylov  subspaces  that  are  increasingly 
employed  to  solve  the  large  sparse  linear  systems  arising  in  the  numerical  solution  of  partial 
differential  equations  and  elsewhere:  Arnoldi  and  Lanczos  methods,  Conjugate  Gradients 
(CG),  the  Full  Orthogonalization  Method  (FOM),  and  the  Generalized  Minimal  Residual 
Method  (GMRES).  The  chapter  concludes  with  a  short  introduction  to  wavelets,  a  power¬ 
ful  modern  alternative  to  classical  Fourier  analysis,  now  used  extensively  throughout  signal 
processing  and  imaging  science. 

The  final  Chapter  10  applies  eigenvalues  to  linear  dynamical  systems  modeled  by  systems 
of  ordinary  differential  equations.  After  developing  basic  solution  techniques,  the  focus 
shifts  to  understanding  the  qualitative  properties  of  solutions  and  particularly  the  role 
of  eigenvalues  in  the  stability  of  equilibria.  The  two-dimensional  case  is  discussed  in  full 
detail,  culminating  in  a  complete  classification  of  the  possible  phase  portraits  and  stability 
properties.  Matrix  exponentials  are  introduced  as  an  alternative  route  to  solving  first  order 
homogeneous  systems,  and  are  also  applied  to  solve  the  inhomogeneous  version,  as  well  as 
to  geometry,  symmetry,  and  group  theory.  Our  final  topic  is  second  order  linear  systems, 
which  model  dynamical  motions  and  vibrations  in  mechanical  structures  and  electrical 
circuits.  In  the  absence  of  frictional  damping  and  instabilities,  solutions  are  quasiperiodic 
combinations  of  the  normal  modes.  We  finish  by  briefly  discussing  the  effects  of  damping 
and  of  periodic  forcing,  including  its  potentially  catastrophic  role  in  resonance. 

Course  Outlines 

Our  book  includes  far  more  material  than  can  be  comfortably  covered  in  a  single  semester; 
a  full  year’s  course  would  be  able  to  do  it  justice.  If  you  do  not  have  this  luxury,  several 
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possible  semester  and  quarter  courses  can  be  extracted  from  the  wealth  of  material  and 
applications. 

First,  the  core  of  basic  linear  algebra  that  all  students  should  know  includes  the  following 
topics,  which  are  indexed  by  the  section  numbers  where  they  appear: 

•  Matrices,  vectors,  Gaussian  Elimination,  matrix  factorizations,  Forward  and 

Back  Substitution,  inverses,  determinants:  1. 1-1.6,  1.8-1. 9. 

•  Vector  spaces,  subspaces,  linear  independence,  bases,  dimension:  2. 1-2.5. 

•  Inner  products  and  their  associated  norms:  3. 1-3.3. 

•  Orthogonal  vectors,  bases,  matrices,  and  projections:  4. 1-4.4. 

•  Positive  definite  matrices  and  minimization  of  quadratic  functions:  3. 4-3. 5,  5.2 

•  Linear  functions  and  linear  and  affine  transformations:  7. 1-7.3. 

•  Eigenvalues  and  eigenvectors:  8. 2-8. 3. 

•  Linear  iterative  systems:  9. 1-9.2. 

With  these  in  hand,  a  variety  of  thematic  threads  can  be  extracted,  including: 

•  Minimization,  least  squares,  data  fitting  and  interpolation:  4.5,  5. 3-5. 5. 

•  Dynamical  systems:  8.4,  8.6  (Jordan  canonical  form),  10.1-10.4. 

•  Engineering  applications:  Chapter  6,  10.1-10.2,  10.5-10.6. 

•  Data  analysis:  5. 3-5. 5,  8.5,  8. 7-8. 8. 

•  Numerical  methods:  8.6  (Schnr  decomposition),  8.7,  9. 1-9.2,  9. 4-9. 6. 

•  Signal  processing:  3.6,  5.6,  9.7. 

•  Probabilistic  and  statistical  applications:  8. 7-8. 8,  9.3. 

•  Theoretical  foundations  of  linear  algebra:  Chapter  7. 

For  a  first  semester  or  quarter  course,  we  recommend  covering  as  much  of  the  core 
as  possible,  and,  if  time  permits,  at  least  one  of  the  threads,  our  own  preference  being 
the  material  on  structures  and  circuits.  One  option  for  streamlining  the  syllabus  is  to 
concentrate  on  ffilite-dimensional  vector  spaces,  bypassing  the  function  space  material, 
although  this  would  deprive  the  students  of  important  insight  into  the  full  scope  of  linear 
algebra. 

For  a  second  course  in  linear  algebra,  the  students  are  typically  familiar  with  elemen¬ 
tary  matrix  methods,  including  the  basics  of  matrix  arithmetic,  Gaussian  Elimination, 
determinants,  inverses,  dot  product  and  Euclidean  norm,  eigenvalues,  and,  often,  first  or¬ 
der  systems  of  ordinary  differential  equations.  Thus,  much  of  Chapter  1  can  be  reviewed 
quickly.  On  the  other  hand,  the  more  abstract  fundamentals,  including  vector  spaces,  span, 
linear  independence,  basis,  and  dimension  are,  in  our  experience,  still  not  fully  mastered, 
and  one  should  expect  to  spend  a  significant  fraction  of  the  early  part  of  the  course  covering 
these  essential  topics  from  Chapter  2  in  full  detail.  Beyond  the  core  material,  there  should 
be  time  for  a  couple  of  the  indicated  threads  depending  on  the  audience  and  interest  of  the 
instructor. 

Similar  considerations  hold  for  a  beginning  graduate  level  course  for  scientists  and  engi¬ 
neers.  Here,  the  emphasis  should  be  on  applications  required  by  the  students,  particularly 
numerical  methods  and  data  analysis,  and  function  spaces  should  be  firmly  built  into  the 
class  from  the  outset.  As  always,  the  students’  mastery  of  the  first  five  sections  of  Chapter  2 
remains  of  paramount  importance. 
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Comments  on  Individual  Chapters 

Chapter  1 :  On  the  assumption  that  the  students  have  already  seen  matrices,  vectors, 
Gaussian  Elimination,  inverses,  and  determinants,  most  of  this  material  will  be  review  and 
should  be  covered  at  a  fairly  rapid  pace.  On  the  other  hand,  the  LU  decomposition  and  the 
emphasis  on  solution  techniques  centered  on  Forward  and  Back  Substitution,  in  contrast  to 
impractical  schemes  involving  matrix  inverses  and  determinants,  might  be  new.  Sections 
1.7,  on  the  practical/numerical  aspects  of  Gaussian  Elimination,  is  optional. 

Chapter  2:  The  crux  of  the  course.  A  key  decision  is  whether  to  incorporate  infinite¬ 
dimensional  vector  spaces,  as  is  recommended  and  done  in  the  text,  or  to  have  an  abbre¬ 
viated  syllabus  that  covers  only  finite-dimensional  spaces,  or,  even  more  restrictively,  only 
Mn  and  subspaces  thereof.  The  last  section,  on  graph  theory,  can  be  skipped  unless  you 
plan  on  covering  Chapter  6  and  (parts  of)  the  final  sections  of  Chapters  9  and  10. 

Chapter  3:  Inner  products  and  positive  definite  matrices  are  essential,  but,  under  time 
constraints,  one  can  delay  Section  3.3,  on  more  general  norms,  as  they  begin  to  matter 
only  in  the  later  stages  of  Chapters  8  and  9.  Section  3.6,  on  complex  vector  spaces,  can 
be  deferred  until  the  discussions  of  complex  eigenvalues,  complex  linear  systems,  and  real 
and  complex  solutions  to  linear  iterative  and  differential  equations;  on  the  other  hand,  it 
is  required  in  Section  5.6,  on  discrete  Fourier  analysis. 

Chapter  4'  The  basics  of  orthogonality,  as  covered  in  Sections  4. 1-4.4,  should  be  an 
essential  part  of  the  students’  training,  although  one  can  certainly  omit  the  final  subsection 
in  Sections  4.2  and  4.3.  The  final  section,  on  orthogonal  polynomials,  is  optional. 

Chapter  5 :  We  recommend  covering  the  solution  of  quadratic  minimization  problems 
and  at  least  the  basics  of  least  squares.  The  applications  —  approximation  of  data,  interpo¬ 
lation  and  approximation  by  polynomials,  trigonometric  functions,  more  general  functions, 
and  splines,  etc.,  are  all  optional,  as  is  the  final  section  on  discrete  Fourier  methods  and 
the  Fast  Fourier  Transform. 

Chapter  6  provides  a  welcome  relief  from  the  theory  for  the  more  applied  students  in  the 
class,  and  is  one  of  our  favorite  parts  to  teach.  While  it  may  well  be  skipped,  the  material 
is  particularly  appealing  for  a  class  with  engineering  students.  One  could  specialize  to  just 
the  material  on  mass/spring  chains  and  structures,  or,  alternatively,  on  electrical  circuits 
with  the  connections  to  spectral  graph  theory,  based  on  Section  2.6,  and  further  developed 
in  Section  8.7. 

Chapter  7:  The  first  third  of  this  chapter,  on  linear  functions,  linear  and  affine  trans¬ 
formations,  and  geometry,  is  part  of  the  core.  This  remainder  of  the  chapter  recasts  many 
of  the  linear  algebraic  techniques  already  encountered  in  the  context  of  matrices  and  vec¬ 
tors  in  Euclidean  space  in  a  more  general  abstract  framework,  and  could  be  skimmed  over 
or  entirely  omitted  if  time  is  an  issue,  with  the  relevant  constructions  introduced  in  the 
context  of  more  concrete  developments,  as  needed. 

Chapter  8:  Eigenvalues  are  absolutely  essential.  The  motivational  material  based  on 
solving  systems  of  differential  equations  in  Section  8.1  can  be  skipped  over.  Sections  8.2 
and  8.3  are  the  heart  of  the  matter.  Of  the  remaining  sections,  the  material  on  sym¬ 
metric  matrices  should  have  the  highest  priority,  leading  to  singular  values  and  principal 
component  analysis  and  a  variety  of  numerical  methods. 


Preface 


xv 


Chapter  9:  If  time  permits,  the  first  two  sections  are  well  worth  covering.  For  a  numeri¬ 
cally  oriented  class,  Sections  9. 4-9. 6  would  be  a  priority,  whereas  Section  9.3  studies  Markov 
processes  —  an  appealing  probabilistic/stochastic  application.  The  chapter  concludes  with 
an  optional  introduction  to  wavelets,  which  is  somewhat  off-topic,  but  nevertheless  serves 
to  combine  orthogonality  and  iterative  methods  in  a  compelling  and  important  modern 
application. 

Chapter  10  is  devoted  to  linear  systems  of  ordinary  differential  equations,  their  solutions, 
and  their  stability  properties.  The  basic  techniques  will  be  a  repeat  to  students  who  have 
already  taken  an  introductory  linear  algebra  and  ordinary  differential  equations  course,  but 
the  more  advanced  material  will  be  new  and  of  interest. 

Changes  from  the  First  Edition 

For  the  Second  Edition,  we  have  revised  and  edited  the  entire  manuscript,  correcting  all 
known  errors  and  typos,  and,  we  hope,  not  introducing  any  new  ones!  Some  of  the  existing 
material  has  been  rearranged.  The  most  significant  change  is  having  moved  the  chapter  on 
orthogonality  to  before  the  minimization  and  least  squares  chapter,  since  orthogonal  vec¬ 
tors,  bases,  and  subspaces,  as  well  as  the  Gram-Schmidt  process  and  orthogonal  projection 
play  an  absolutely  fundamental  role  in  much  of  the  later  material.  In  this  way,  it  is  easier 
to  skip  over  Chapter  5  with  minimal  loss  of  continuity.  Matrix  norms  now  appear  much 
earlier  in  Section  3.3,  since  they  are  employed  in  several  other  locations.  The  second  major 
reordering  is  to  switch  the  chapters  on  iteration  and  dynamics,  in  that  the  former  is  more 
attuned  to  linear  algebra,  while  the  latter  is  oriented  towards  analysis.  In  the  same  vein, 
space  constraints  compelled  us  to  delete  the  last  chapter  of  the  first  edition,  which  was  on 
boundary  value  problems.  Although  this  material  serves  to  emphasize  the  importance  of 
the  abstract  linear  algebraic  techniques  developed  throughout  the  text,  now  extended  to 
infinite-dimensional  function  spaces,  the  material  contained  therein  can  now  all  be  found 
in  the  first  author’s  Springer  Undergraduate  Text  in  Mathematics,  Introduction  to  Partial 
Differential  Equations ,  [61],  with  the  exception  of  the  subsection  on  splines,  which  now 
appears  at  the  end  of  Section  5.5. 

There  are  several  significant  additions: 

•  In  recognition  of  their  increasingly  essential  role  in  modern  data  analysis  and  statis¬ 

tics,  Section  8.7,  on  singular  values,  has  been  expanded,  continuing  into  the  new 
Section  8.8,  on  Principal  Component  Analysis,  which  includes  a  brief  introduction 
to  basic  statistical  data  analysis. 

•  We  have  added  a  new  Section  9.6,  on  Krylov  subspace  methods,  which  are  increasingly 

employed  to  devise  effective  and  efficient  numerical  solution  schemes  for  sparse  linear 
systems  and  eigenvalue  calculations. 

•  Section  8.4  introduces  and  characterizes  invariant  subspaces,  in  recognition  of  their 

importance  to  dynamical  systems,  both  finite-  and  infinite-dimensional,  as  well  as 
linear  iterative  systems,  and  linear  control  systems.  (Much  as  we  would  have  liked 
also  to  add  material  on  linear  control  theory,  space  constraints  ultimately  interfered.) 

•  We  included  some  basics  of  spectral  graph  theory,  of  importance  in  contemporary 

theoretical  computer  science,  data  analysis,  networks,  imaging,  etc.,  starting  in  Sec¬ 
tion  2.6  and  continuing  to  the  graph  Laplacian,  introduced,  in  the  context  of  elec¬ 
trical  networks,  in  Section  6.2,  along  with  its  spectrum  —  eigenvalues  and  singular 
values  —  in  Section  8.7. 
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•  We  decided  to  include  a  short  Section  9.7,  on  wavelets.  While  this  perhaps  fits  more 

naturally  with  Section  5.6,  on  discrete  Fourier  analysis,  the  convergence  proofs  rely 
on  the  solution  to  an  iterative  linear  system  and  hence  on  preceding  developments 
in  Chapter  9. 

•  A  number  of  new  exercises  have  been  added,  in  the  new  sections  and  also  scattered 

throughout  the  text. 

Following  the  advice  of  friends,  colleagues,  and  reviewers,  we  have  also  revised  some 
of  the  less  standard  terminology  used  in  the  first  edition  to  bring  it  closer  to  the  more 
commonly  accepted  practices.  Thus  “range”  is  now  “image”  and  “target  space”  is  now 
“codomain”.  The  terms  “special  lower/upper  triangular  matrix”  are  now  “lower/upper 
unitriangular  matrix” ,  thus  drawing  attention  to  their  unipotence.  On  the  other  hand,  the 
term  “regular”  for  a  square  matrix  admitting  an  LU  factorization  has  been  kept,  since 
there  is  really  no  suitable  alternative  appearing  in  the  literature.  Finally,  we  decided  to 
retain  our  term  “complete”  for  a  matrix  that  admits  a  complex  eigenvector  basis,  in  lieu  of 
“diagonalizable”  (which  depends  upon  whether  one  deals  in  the  real  or  complex  domain), 
“semi-simple”,  or  “perfect”.  This  choice  permits  us  to  refer  to  a  “complete  eigenvalue”, 
independent  of  the  underlying  status  of  the  matrix. 

Exercises  and  Software 

Exercises  appear  at  the  end  of  almost  every  subsection,  and  come  in  a  medley  of  flavors. 
Each  exercise  set  starts  with  some  straightforward  computational  problems  to  test  students’ 
comprehension  and  reinforce  the  new  techniques  and  ideas.  Ability  to  solve  these  basic 
problems  should  be  thought  of  as  a  minimal  requirement  for  learning  the  material.  More 
advanced  and  theoretical  exercises  tend  to  appear  later  on  in  the  set.  Some  are  routine, 
but  others  are  challenging  computational  problems,  computer-based  exercises  and  projects, 
details  of  proofs  that  were  not  given  in  the  text,  additional  practical  and  theoretical  results 
of  interest,  further  developments  in  the  subject,  etc.  Some  will  challenge  even  the  most 
advanced  student. 

As  a  guide,  some  of  the  exercises  are  marked  with  special  signs: 

indicates  an  exercise  that  is  used  at  some  point  in  the  text,  or  is  important  for  further 
development  of  the  subject. 

T  indicates  a  project  —  usually  an  exercise  with  multiple  interdependent  parts. 

X  indicates  an  exercise  that  requires  (or  at  least  strongly  recommends)  use  of  a  computer. 
The  student  could  either  be  asked  to  write  their  own  computer  code  in,  say,  Matlab, 
Mathematica,  Maple,  etc.,  or  make  use  of  pre-existing  software  packages. 

X  =  X  +  T  indicates  a  computer  project. 

Advice  to  instructors :  Don’t  be  afraid  to  assign  only  a  couple  of  parts  of  a  multi-part 
exercise.  We  have  found  the  True/False  exercises  to  be  a  particularly  useful  indicator  of 
a  student’s  level  of  understanding.  Emphasize  to  the  students  that  a  full  answer  is  not 
merely  a  T  or  F,  but  must  include  a  detailed  explanation  of  the  reason,  e.g.,  a  proof,  or  a 
counterexample,  or  a  reference  to  a  result  in  the  text,  etc. 
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Conventions  and  Notations 


Note :  A  full  symbol  and  notation  index  can  be  found  at  the  end  of  the  book. 


Equations  are  numbered  consecutively  within  chapters,  so  that,  for  example,  (3.12) 
refers  to  the  12th  equation  in  Chapter  3.  Theorems,  Lemmas,  Propositions,  Definitions, 
and  Examples  are  also  numbered  consecutively  within  each  chapter,  using  a  common  index. 
Thus,  in  Chapter  1,  Lemma  1.2  follows  Definition  1.1,  and  precedes  Theorem  1.3  and 
Example  1.4.  We  find  this  numbering  system  to  be  the  most  conducive  for  navigating 
through  the  book. 

References  to  books,  papers,  etc.,  are  listed  alphabetically  at  the  end  of  the  text,  and 
are  referred  to  by  number.  Thus,  [61]  indicates  the  61st  listed  reference,  which  happens  to 
be  the  first  author’s  partial  differential  equations  text. 

Q.E.D.  is  placed  at  the  end  of  a  proof,  being  the  abbreviation  of  the  classical  Latin  phrase 
quod  erat  demonstrandum ,  which  can  be  translated  as  “what  was  to  be  demonstrated”. 


R,  C,  Z,  Q  denote,  respectively,  the  real  numbers,  the  complex  numbers,  the  integers, 
and  the  rational  numbers.  We  use  e  ps  2.71828182845904...  to  denote  the  base  of  the 
natural  logarithm,  tt  =  3.14159265358979  . . .  for  the  area  of  a  circle  of  unit  radius,  and  i 
to  denote  the  imaginary  unit,  i.e.,  one  of  the  two  square  roots  of  —1,  the  other  being  —  i . 


The  absolute  value  of  a  real  number  x  is  denoted  by 
modulus  of  the  complex  number  z. 


x 


more  generally,  |  z  |  denotes  the 


We  consistently  use  boldface  lowercase  letters,  e.g.,  v,x,a,  to  denote  vectors  (almost 
always  column  vectors),  whose  entries  are  the  corresponding  non-bold  subscripted  letter: 
v1:xi:an ,  etc.  Matrices  are  denoted  by  ordinary  capital  letters,  e.g.,  A,  C,  AT,  M  —  but 
not  all  such  letters  refer  to  matrices;  for  instance,  V  often  refers  to  a  vector  space,  L  to 
a  linear  function,  etc.  The  entries  of  a  matrix,  say  A,  are  indicated  by  the  corresponding 
subscripted  lowercase  letters,  a-  being  the  entry  in  its  zth  row  and  jth  column. 

We  use  the  standard  notations 


n 

N.  ai  =  ai  +  a2  an, 

i—  1 


n 


a.  =  a1a2  •  •  -  an, 


for  the  sum  and  product  of  the  quantities  a1,...,an.  We  use  max  and  min  to  denote 
maximum  and  minimum,  respectively,  of  a  closed  subset  of  R.  Modular  arithmetic  is 
indicated  by  j  =  k  mod  n,  for  j,  /c,  n  E  Z  with  n  >  0,  to  mean  j  —  k  is  divisible  by  n. 


We  use  S  =  {  / 1  C }  to  denote  a  set,  where  /  is  a  formula  for  the  members  of  the 
set  and  C  is  a  list  of  conditions,  which  may  be  empty,  in  which  case  it  is  omitted.  For 
example,  {x|0<x<  1}  means  the  closed  unit  interval  from  0  to  1,  also  denoted  [0, 1], 
while  {ax2 +  bx  +  c\  a,  5,  cGl}  is  the  set  of  real  quadratic  polynomials,  and  {0}  is  the 
set  consisting  only  of  the  number  0.  We  write  x  E  S  to  indicate  that  x  is  an  element  of  the 
set  A,  while  y  0  S  says  that  y  is  not  an  element.  The  cardinality,  or  number  of  elements, 
in  the  set  A,  which  may  be  infinite,  is  denoted  by  #A.  The  union  and  intersection  of  the 
sets  A,  B  are  respectively  denoted  by  A  U  B  and  A  fl  B.  The  subset  notation  A  C  B 
includes  the  possibility  that  the  sets  might  be  equal,  although  for  emphasis  we  sometimes 
write  ACS,  while  ACS  specifically  implies  that  A  ^  B.  We  can  also  write  A  C  B  as 
B  D  A.  We  use  B\A  =  {  x  |  x  E  L>, x  0  A}  to  denote  the  set-theoretic  difference,  meaning 
all  elements  of  B  that  do  not  belong  to  A. 
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An  arrow  is  used  in  two  senses:  first,  to  indicate  convergence  of  a  sequence:  xn  x * 
as  n  — ^  oo;  second,  to  indicate  a  function,  so  /:  X  — >►  Y  means  that  /  defines  a  function 
from  the  domain  set  X  to  the  codomain  set  Y,  written  y  =  f{x)  G7  for  x  E  X.  We  use 
=  to  emphasize  when  two  functions  agree  everywhere,  so  f(x)  =  1  means  that  /  is  the 
constant  function,  equal  to  1  at  all  values  of  x.  Composition  of  functions  is  denoted  f  °g. 

Angles  are  always  measured  in  radians  (although  occasionally  degrees  will  be  mentioned 
in  descriptive  sentences).  All  trigonometric  functions,  cos,  sin,  tan,  sec,  etc.,  are  evaluated 
on  radians.  (Make  sure  your  calculator  is  locked  in  radian  mode!) 

As  usual,  we  denote  the  natural  exponential  function  by  ex.  We  always  use  logx  for 
its  inverse  —  the  natural  (base  e)  logarithm  (never  the  ugly  modern  version  lnx),  while 
loga  x  =  logx/  log  a  is  used  for  logarithms  with  base  a. 

We  follow  the  reference  tome  [59]  (whose  mathematical  editor  is  the  first  author’s  father) 
and  use  phz  for  the  phase  of  a  complex  number.  We  prefer  this  to  the  more  common  term 
“argument”,  which  is  also  used  to  refer  to  the  argument  of  a  function  /(z),  while  “phase” 
is  completely  unambiguous  and  hence  to  be  preferred. 

We  will  employ  a  variety  of  standard  notations  for  derivatives.  In  the  case  of  ordinary 

du 

derivatives,  the  most  basic  is  the  Leibnizian  notation  —  for  the  derivative  of  u  with 

ax 

respect  to  x;  an  alternative  is  the  Lagrangian  prime  notation  u'.  Higher  order  derivatives 

d2,  u  dn  u 

are  similar,  with  u"  denoting  -  ,  while  u ^  denotes  the  nth  order  derivative  — — .  If  the 

dx 2  dxn 

function  depends  on  time,  £,  instead  of  space,  x,  then  we  use  the  Newtonian  dot  notation, 

du  ..  d2u  ,i  .  x  du  du  d2u  d2u 

u  =  —  ,  u  —  — tt.  We  use  the  full  Leibniz  notation  — —  ,  — —  ,  — —  ,  — — —  ,  for  partial 

dt  ’  dt2  dx'dt  dx2  'dxdt 

derivatives  of  functions  of  several  variables.  All  functions  are  assumed  to  be  sufficiently 

smooth  that  any  indicated  derivatives  exist  and  mixed  partial  derivatives  are  equal,  cf.  [2]. 


Definite  integrals  are  denoted  by  /  f(x)  dx ,  while  /  f(x)  dx  is  the  corresponding 


indefinite  integral  or  anti-derivative.  In  general,  limits  are  denoted  by  lim  ,  while  lim 

x  — >  y  x  — >  yX 

and  lim  are  used  to  denote  the  two  one-sided  limits  in  R. 


X 


y 


Preface 


rpn  rp 
fjy  i/tXj 


History  and  Biography 

Mathematics  is  both  a  historical  and  a  social  activity,  and  many  of  the  algorithms,  theo¬ 
rems,  and  formulas  are  named  after  famous  (and,  on  occasion,  not-so-famous)  mathemati¬ 
cians,  scientists,  engineers,  etc.  —  usually,  but  not  necessarily,  the  one(s)  who  first  came  up 
with  the  idea.  We  try  to  indicate  first  names,  approximate  dates,  and  geographic  locations 
of  most  of  the  named  contributors.  Readers  who  are  interested  in  additional  historical  de¬ 
tails,  complete  biographies,  and,  when  available,  portraits  or  photos,  are  urged  to  consult 
the  wonderful  University  of  St.  Andrews  MacTutor  History  of  Mathematics  archive: 

http : / / www-hist ory . mcs . st-and .ac.uk 


Some  Final  Remarks 

To  the  student :  You  are  about  to  learn  modern  applied  linear  algebra.  We  hope  you 
enjoy  the  experience  and  profit  from  it  in  your  future  studies  and  career.  (Indeed,  we 
recommended  holding  onto  this  book  to  use  for  future  reference.)  Please  send  us  your 
comments,  suggestions  for  improvement,  along  with  any  errors  you  might  spot.  Did  you 
find  our  explanations  helpful  or  confusing?  Were  enough  examples  included  in  the  text? 
Were  the  exercises  of  sufficient  variety  and  at  an  appropriate  level  to  enable  you  to  learn 
the  material? 

To  the  instructor :  Thank  you  for  adopting  our  text!  We  hope  you  enjoy  teaching  from 
it  as  much  as  we  enjoyed  writing  it.  Whatever  your  experience,  we  want  to  hear  from  you. 
Let  us  know  which  parts  you  liked  and  which  you  didn’t.  Which  sections  worked  and  which 
were  less  successful.  Which  parts  your  students  enjoyed,  which  parts  they  struggled  with, 
and  which  parts  they  disliked.  How  can  we  improve  it? 

Like  every  author,  we  sincerely  hope  that  we  have  written  an  error- free  text.  Indeed,  all 
known  errors  in  the  first  edition  have  been  corrected  here.  On  the  other  hand,  judging  from 
experience,  we  know  that,  no  matter  how  many  times  you  proofread,  mistakes  still  manage 
to  sneak  through.  So  we  ask  your  indulgence  to  correct  the  few  (we  hope)  that  remain. 
Even  better,  email  us  with  your  questions,  typos,  mathematical  errors  and  obscurities, 
comments,  suggestions,  etc. 

The  second  edition’s  dedicated  web  site 

http :  /  /  www .  math .  umn .  edu/ solver/ ala2 .  html 

will  contain  a  list  of  known  errors,  commentary,  feedback,  and  resources,  as  well  as  a 
number  of  illustrative  Matlab  programs  that  we’ve  used  when  teaching  the  course.  Links 
to  the  Selected  Solutions  Manual  will  also  be  posted  there. 


Preface 


rpry 


Acknowledgments 

First,  let  us  express  our  profound  gratitude  to  Gil  Strang  for  his  continued  encouragement 
from  the  very  beginning  of  this  undertaking.  Readers  familiar  with  his  groundbreaking 
texts  and  remarkable  insight  can  readily  find  his  influence  throughout  our  book.  We 
thank  Pavel  Belik,  Tim  Garoni,  Donald  Kahn,  Markus  Keel,  Cristina  Santa  Marta,  Nil- 
ima  Nigam,  Greg  Pierce,  Fadil  Santosa,  Wayne  Schmaedeke,  Jackie  Shen,  Peter  Shook, 
Thomas  Scofield,  and  Richard  Varga,  as  well  as  our  classes  and  students,  particularly  Ta- 
iala  Carvalho,  Colleen  Duffy,  and  Ryan  Lloyd,  and  last,  but  certainly  not  least,  our  late 
father/father-in-law  Frank  W.J.  Olver  and  son  Sheehan  Olver,  for  proofreading,  correc¬ 
tions,  remarks,  and  useful  suggestions  that  helped  us  create  the  first  edition.  We  acknowl¬ 
edge  Mikhail  Shvartsman’s  contributions  to  the  arduous  task  of  writing  out  the  solutions 
manual.  We  also  acknowledge  the  helpful  feedback  from  the  reviewers  of  the  original 
manuscript:  Augustin  Banyaga,  Robert  Cramer,  James  Curry,  Jerome  Dancis,  Bruno 

Harris,  Norman  Johnson,  Cerry  Klein,  Doron  Lubinsky,  Juan  Manfredi,  Fabio  Augusto 

/  _ 

Milner,  Tzuong-Tsieng  Moh,  Paul  S.  Muhly,  Juan  Carlos  Alvarez  Paiva,  John  F.  Rossi, 
Brian  Shader,  Shagi-Di  Shih,  Tamas  Wiandt,  and  two  anonymous  reviewers. 

We  thank  many  readers  and  students  for  their  strongly  encouraging  remarks,  that  cumu¬ 
latively  helped  inspire  us  to  contemplate  making  this  new  edition.  We  would  particularly 
like  to  thank  Nihat  Bayhan,  Joe  Benson,  James  Broomfield,  Juan  Cockburn,  Richard  Cook, 
Stephen  DeSalvo,  Anne  Dougherty,  Ken  Driessel,  Kathleen  Fuller,  Mary  Halloran,  Stu¬ 
art  Hastings,  David  Hiebeler,  Jeffrey  Humpherys,  Roberta  Jaskolski,  Tian-Jun  Li,  James 
Meiss,  Willard  Miller,  Jr.,  Sean  Rostami,  Arnd  Scheel,  Timo  Schiirg,  David  Tieri,  Peter 
Webb,  Timothy  Welle,  and  an  anonymous  reviewer  for  their  comments  on,  suggestions  for, 
and  corrections  to  the  three  printings  of  the  first  edition  that  have  led  to  this  improved 
second  edition.  We  particularly  want  to  thank  Linda  Ness  for  extensive  help  with  the 
sections  on  SVD  and  PCA,  including  suggestions  for  some  of  the  exercises.  We  also  thank 
David  Kramer  for  his  meticulous  proofreading  of  the  text. 

And  of  course,  we  owe  an  immense  debt  to  Loretta  Bartolini  and  Achi  Dosanjh  at 
Springer,  first  for  encouraging  us  to  take  on  a  second  edition,  and  then  for  their  willingness 
to  work  with  us  to  produce  the  book  you  now  have  in  hand  —  especially  Loretta’s  unwa¬ 
vering  support,  patience,  and  advice  during  the  preparation  of  the  manuscript,  including 
encouraging  us  to  adopt  and  helping  perfect  the  full-color  layout,  which  we  hope  you  enjoy. 


Peter  J.  Olver 
University  of  Minnesota 

olver0umn.edu 


Cheri  Shakiban 
University  of  St.  Thomas 

cshakiban@stthomas . edu 


Minnesota,  March  2018 


Table  of  Contents 

Preface . vii 

Chapter  1.  Linear  Algebraic  Systems .  1 

1.1.  Solution  of  Linear  Systems . 1 

1.2.  Matrices  and  Vectors . 3 

Matrix  Arithmetic  . 5 

1.3.  Gaussian  Elimination  —  Regular  Case . 12 

Elementary  Matrices  . 16 

The  LU  Factorization . 18 

Forward  and  Back  Substitution . 20 

1.4.  Pivoting  and  Permutations . 22 

Permutations  and  Permutation  Matrices . 25 

The  Permuted  LU  Factorization . 27 

1.5.  Matrix  Inverses . 31 

Gauss-Jordan  Elimination . 35 

Solving  Linear  Systems  with  the  Inverse . 40 

The  LDV  Factorization . 41 

1.6.  Transposes  and  Symmetric  Matrices . 43 

Factorization  of  Symmetric  Matrices . 45 

1.7.  Practical  Linear  Algebra . 48 

Tridiagonal  Matrices  . 52 

Pivoting  Strategies . 55 

1.8.  General  Linear  Systems . 59 

Homogeneous  Systems . 67 

1.9.  Determinants . 69 

Chapter  2.  Vector  Spaces  and  Bases . 75 

2.1.  Real  Vector  Spaces . 76 

2.2.  Subspaces . 81 

2.3.  Span  and  Linear  Independence  . 87 

Linear  Independence  and  Dependence . 92 

2.4.  Basis  and  Dimension . 98 

2.5.  The  Fundamental  Matrix  Subspaces . 105 

Kernel  and  Image . 105 

The  Superposition  Principle . 110 

Adjoint  Systems,  Cokernel,  and  Coimage . 112 

The  Fundamental  Theorem  of  Linear  Algebra  . 114 

2.6.  Graphs  and  Digraphs . 120 

rprprt 

v 


xxii  Table  of  Contents 

Chapter  3.  Inner  Products  and  Norms .  129 

3.1.  Inner  Products  . 129 

Inner  Products  on  Function  Spaces . 133 

3.2.  Inequalities . 137 

The  Cauchy-Schwarz  Inequality  . 137 

Orthogonal  Vectors . 140 

The  Triangle  Inequality . 142 

3.3.  Norms . 144 

Unit  Vectors . 148 

Equivalence  of  Norms . 150 

Matrix  Norms . 153 

3.4.  Positive  Definite  Matrices . 156 

Gram  Matrices . 161 

3.5.  Completing  the  Square . 166 

The  Cholesky  Factorization  . 171 

3.6.  Complex  Vector  Spaces . 172 

Complex  Numbers  . 173 

Complex  Vector  Spaces  and  Inner  Products  . 177 

Chapter  4.  Orthogonality .  183 

4.1.  Orthogonal  and  Orthonormal  Bases . 184 

Computations  in  Orthogonal  Bases . 188 

4.2.  The  Gram-Schmidt  Process . 192 

Modifications  of  the  Gram-Schmidt  Process . 197 

4.3.  Orthogonal  Matrices . 200 

The  QR  Factorization . 205 

Ill-Conditioned  Systems  and  Householder’s  Method . 208 

4.4.  Orthogonal  Projections  and  Orthogonal  Subspaces . 212 

Orthogonal  Projection . 213 

Orthogonal  Subspaces . 216 

Orthogonality  of  the  Fundamental  Matrix  Subspaces 

and  the  Fredholm  Alternative  ...  221 

4.5.  Orthogonal  Polynomials  . 226 

The  Legendre  Polynomials . 227 

Other  Systems  of  Orthogonal  Polynomials . 231 

Chapter  5.  Minimization  and  Least  Squares .  235 

5.1.  Minimization  Problems . 235 

Equilibrium  Mechanics  . 236 

Solution  of  Equations . 236 

The  Closest  Point . 238 

5.2.  Minimization  of  Quadratic  Functions . 239 

5.3.  The  Closest  Point . 245 

5.4.  Least  Squares . 250 


Table  of  Contents  xxiii 

5.5.  Data  Fitting  and  Interpolation  . 254 

Polynomial  Approximation  and  Interpolation . 259 

Approximation  and  Interpolation  by  General  Functions . 271 

Least  Squares  Approximation  in  Function  Spaces . 274 

Orthogonal  Polynomials  and  Least  Squares . 277 

Splines  . 279 

5.6.  Discrete  Fourier  Analysis  and  the  Fast  Fourier  Transform . 285 

Compression  and  Denoising  . 293 

The  Fast  Fourier  Transform . 295 

Chapter  6.  Equilibrium .  301 

6.1.  Springs  and  Masses  . 301 

Positive  Definiteness  and  the  Minimization  Principle  . 309 

6.2.  Electrical  Networks  . 311 

Batteries,  Power,  and  the  Electrical-Mechanical  Correspondence  .  317 

6.3.  Structures  . 322 

Chapter  7.  Linearity .  341 

7.1.  Linear  Functions . 342 

Linear  Operators . 347 

The  Space  of  Linear  Functions . 349 

Dual  Spaces . 350 

Composition . 352 

Inverses . 355 

7.2.  Linear  Transformations . 358 

Change  of  Basis  . 365 

7.3.  Affine  Transformations  and  Isometries . 370 

Isometry . 372 

7.4.  Linear  Systems . 376 

The  Superposition  Principle . 378 

Inhomogeneous  Systems . 383 

Superposition  Principles  for  Inhomogeneous  Systems  . 388 

Complex  Solutions  to  Real  Systems . 390 

7.5.  Adjoints,  Positive  Definite  Operators,  and  Minimization  Principles  .  .  395 

Self-Adjoint  and  Positive  Definite  Linear  Functions . 398 

Minimization . 400 

Chapter  8.  Eigenvalues  and  Singular  Values .  403 

8.1.  Linear  Dynamical  Systems . 404 

Scalar  Ordinary  Differential  Equations . 404 

First  Order  Dynamical  Systems . 407 

8.2.  Eigenvalues  and  Eigenvectors . 408 

Basic  Properties  of  Eigenvalues . 415 

The  Gerschgorin  Circle  Theorem . 420 


xxiv  Table  of  Contents 

8.3.  Eigenvector  Bases . 423 

Diagonalization . 426 

8.4.  Invariant  Subspaces . 429 

8.5.  Eigenvalues  of  Symmetric  Matrices . 431 

The  Spectral  Theorem . 437 

Optimization  Principles  for  Eigenvalues  of  Symmetric  Matrices  .  .  440 

8.6.  Incomplete  Matrices . 444 

The  Schur  Decomposition . 444 

The  Jordan  Canonical  Form . 447 

8.7.  Singular  Values . 454 

The  Psendoinverse  . 457 

The  Euclidean  Matrix  Norm . 459 

Condition  Number  and  Rank . 460 

Spectral  Graph  Theory . 462 

8.8.  Principal  Component  Analysis . 467 

Variance  and  Covariance . 467 

The  Principal  Components . 471 

Chapter  9.  Iteration .  475 

9.1.  Linear  Iterative  Systems . 476 

Scalar  Systems . 476 

Powers  of  Matrices . 479 

Diagonalization  and  Iteration . 484 

9.2.  Stability . 488 

Spectral  Radius . 489 

Fixed  Points . 493 

Matrix  Norms  and  Convergence . 495 

9.3.  Markov  Processes . 499 

9.4.  Iterative  Solution  of  Linear  Algebraic  Systems . 506 

The  Jacobi  Method . 508 

The  Gauss-Seidel  Method . 512 

Successive  Over- Relaxation . 517 

9.5.  Numerical  Computation  of  Eigenvalues . 522 

The  Power  Method . 522 

The  QR  Algorithm . 526 

Tridiagonalization . 532 

9.6.  Krylov  Subspace  Methods . 536 

Krylov  Subspaces . 536 

Arnoldi  Iteration . 537 

The  Full  Orthogonalization  Method . 540 

The  Conjugate  Gradient  Method . 542 

The  Generalized  Minimal  Residual  Method . 546 


Table  of  Contents 


XXV 


9.7.  Wavelets . 549 

The  Haar  Wavelets . 549 

Modern  Wavelets . 555 

Solving  the  Dilation  Equation  . 559 

Chapter  10.  Dynamics .  565 

10.1.  Basic  Solution  Techniques . 565 

The  Phase  Plane . 567 

Existence  and  Uniqueness . 570 

Complete  Systems  . 572 

The  General  Case . 575 

10.2.  Stability  of  Linear  Systems . 579 

10.3.  Two-Dimensional  Systems . 585 

Distinct  Real  Eigenvalues . 586 

Complex  Conjugate  Eigenvalues  . 587 

Incomplete  Double  Real  Eigenvalue . 588 

Complete  Double  Real  Eigenvalue . 588 

10.4.  Matrix  Exponentials  . 592 

Applications  in  Geometry . 599 

Invariant  Subspaces  and  Linear  Dynamical  Systems . 603 

Inhomogeneous  Linear  Systems . 605 

10.5.  Dynamics  of  Structures . 608 

Stable  Structures . 610 

Unstable  Structures . 615 

Systems  with  Differing  Masses . 618 

Friction  and  Damping . 620 

10.6.  Forcing  and  Resonance  . 623 

Electrical  Circuits . 628 

Forcing  and  Resonance  in  Systems  . 630 

References .  633 

Symbol  Index  .  637 

Subject  Index  .  643 


® 

Check  for 

updates 

Chapter  1 

Linear  Algebraic  Systems 

Linear  algebra  is  the  core  of  modern  applied  mathematics.  Its  humble  origins  are  to  be 
found  in  the  need  to  solve  “elementary”  systems  of  linear  algebraic  equations.  But  its 
ultimate  scope  is  vast,  impinging  on  all  of  mathematics,  both  pure  and  applied,  as  well 
as  numerical  analysis,  statistics,  data  science,  physics,  engineering,  mathematical  biology, 
financial  mathematics,  and  every  other  discipline  in  which  mathematical  methods  are  re¬ 
quired.  A  thorough  grounding  in  the  methods  and  theory  of  linear  algebra  is  an  essential 
prerequisite  for  understanding  and  harnessing  the  power  of  mathematics  throughout  its 
multifaceted  applications. 

In  the  first  chapter,  our  focus  will  be  on  the  most  basic  method  for  solving  linear 
algebraic  systems,  known  as  Gaussian  Elimination  in  honor  of  one  of  the  all-time  mathe¬ 
matical  greats,  the  early  nineteenth-century  German  mathematician  Carl  Friedrich  Gauss, 
although  the  method  appears  in  Chinese  mathematical  texts  from  around  150  CE,  if  not 
earlier,  and  was  also  known  to  Isaac  Newton.  Gaussian  Elimination  is  quite  elementary, 
but  remains  one  of  the  most  important  algorithms  in  applied  (as  well  as  theoretical)  math¬ 
ematics.  Our  initial  focus  will  be  on  the  most  important  class  of  systems:  those  involving 
the  same  number  of  equations  as  unknowns  —  although  we  will  eventually  develop  tech¬ 
niques  for  handling  completely  general  linear  systems.  While  the  former  typically  have 
a  unique  solution,  general  linear  systems  may  have  either  no  solutions  or  infinitely  many 
solutions.  Since  physical  models  require  existence  and  uniqueness  of  their  solution,  the  sys¬ 
tems  arising  in  applications  often  (but  not  always)  involve  the  same  number  of  equations 
as  unknowns.  Nevertheless,  the  ability  to  confidently  handle  all  types  of  linear  systems 
is  a  basic  prerequisite  for  further  progress  in  the  subject.  In  contemporary  applications, 
particularly  those  arising  in  numerical  solutions  of  differential  equations,  in  signal  and  im¬ 
age  processing,  and  in  contemporary  data  analysis,  the  governing  linear  systems  can  be 
huge,  sometimes  involving  millions  of  equations  in  millions  of  unknowns,  challenging  even 
the  most  powerful  supercomputer.  So,  a  systematic  and  careful  development  of  solution 
techniques  is  essential.  Section  1.7  discusses  some  of  the  practical  issues  and  limitations  in 
computer  implementations  of  the  Gaussian  Elimination  method  for  large  systems  arising 
in  applications. 

Modern  linear  algebra  relies  on  the  basic  concepts  of  scalar,  vector,  and  matrix,  and 
so  we  must  quickly  review  the  fundamentals  of  matrix  arithmetic.  Gaussian  Elimination 
can  be  profitably  reinterpreted  as  a  certain  matrix  factorization,  known  as  the  (permuted) 
LU  decomposition,  which  provides  valuable  insight  into  the  solution  algorithms.  Matrix 
inverses  and  determinants  are  also  discussed  in  brief,  primarily  for  their  theoretical  prop¬ 
erties.  As  we  shall  see,  formulas  relying  on  the  inverse  or  the  determinant  are  extremely 
inefficient,  and  so,  except  in  low-dimensional  or  highly  structured  environments,  are  to 
be  avoided  in  almost  all  practical  computations.  In  the  theater  of  applied  linear  algebra, 
Gaussian  Elimination  and  matrix  factorization  are  the  stars,  while  inverses  and  determi¬ 
nants  are  relegated  to  the  supporting  cast. 

1.1  Solution  of  Linear  Systems 

Gaussian  Elimination  is  a  simple,  systematic  algorithm  to  solve  systems  of  linear  equations. 
It  is  the  workhorse  of  linear  algebra,  and,  as  such,  of  absolutely  fundamental  importance 
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in  applied  mathematics.  In  this  section,  we  review  the  method  in  the  most  important  case, 
in  which  there  is  the  same  number  of  equations  as  unknowns.  The  general  situation  will 
be  deferred  until  Section  1.8. 

To  illustrate,  consider  an  elementary  system  of  three  linear  equations 

x  -\~  2y  -\~  z  =  2, 

2  x  +  6y  +  z  =  7,  (1.1) 


x  +  y  +  =  3, 


in  three  unknowns  x,y,  z.  Linearity ^  refers  to  the  fact  that  the  unknowns  only  appear  to 
the  first  power,  and  there  are  no  product  terms  like  xy  or  xyz.  The  basic  solution  method 
is  to  systematically  employ  the  following  fundamental  operation: 


Linear  System  Operation  #1: 


Add  a  multiple  of  one  equation  to  another  equation. 


Before  continuing,  you  might  try  to  convince  yourself  that  this  operation  doesn’t  change 
the  solutions  to  the  system.  Our  goal  is  to  judiciously  apply  the  operation  and  so  be  led  to 
a  much  simpler  linear  system  that  is  easy  to  solve,  and,  moreover,  has  the  same  solutions 
as  the  original.  Any  linear  system  that  is  derived  from  the  original  system  by  successive 
application  of  such  operations  will  be  called  an  equivalent  system.  By  the  preceding  remark, 
equivalent  linear  systems  have  the  same  solutions. 

The  systematic  feature  is  that  we  successively  eliminate  the  variables  in  our  equations 
in  order  of  appearance.  We  begin  by  eliminating  the  first  variable,  x,  from  the  second 
equation.  To  this  end,  we  subtract  twice  the  first  equation  from  the  second,  leading  to  the 
equivalent  system 

x  T  2y  T  z  =  2, 

2y-z  =  3,  (1.2) 

x  +  y  +  4z  =  3. 

Next,  we  eliminate  x  from  the  third  equation  by  subtracting  the  first  equation  from  it: 

x  +  2y  +  z  =  2, 

2y-z  =  3,  (1.3) 

-y  +  3z  =  1. 

The  equivalent  system  (1.3)  is  already  simpler  than  the  original  (1.1).  Notice  that  the 
second  and  third  equations  do  not  involve  x  (by  design)  and  so  constitute  a  system  of  two 
linear  equations  for  two  unknowns.  Moreover,  once  we  have  solved  this  subsystem  for  y 
and  z,  we  can  substitute  the  answer  into  the  first  equation,  and  we  need  only  solve  a  single 
linear  equation  for  x. 

We  continue  on  in  this  fashion,  the  next  phase  being  the  elimination  of  the  second 
variable,  y ,  from  the  third  equation  by  adding  ^  the  second  equation  to  it.  The  result  is 

x  T  2y  T  z  —  2, 

2y-z  =  3,  (1.4) 

5  ~  =  5 
2  ^  2  ’ 

which  is  the  simple  system  we  are  after.  It  is  in  what  is  called  triangular  form,  which  means 
that,  while  the  first  equation  involves  all  three  variables,  the  second  equation  involves  only 
the  second  and  third  variables,  and  the  last  equation  involves  only  the  last  variable. 


The  “official”  definition  of  linearity  will  be  deferred  until  Chapter  7. 
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Any  triangular  system  can  be  straightforwardly  solved  by  the  method  of  Back  Substi¬ 
tution.  As  the  name  suggests,  we  work  backwards,  solving  the  last  equation  first,  which 
requires  that  z  —  1.  We  substitute  this  result  back  into  the  penultimate  equation,  which 
becomes  2y  —  1  =  3,  with  solution  y  —  2.  We  finally  substitute  these  two  values  for  y  and 
2  into  the  first  equation,  which  becomes  x  +  5  =  2,  and  so  the  solution  to  the  triangular 
system  (1.4)  is 

x  =  —  3,  y  =  2,  z  —  l.  (1.5) 

Moreover,  since  we  used  only  our  basic  linear  system  operation  to  pass  from  (1.1)  to  the 
triangular  system  (1.4),  this  is  also  the  solution  to  the  original  system  of  linear  equations, 
as  you  can  check.  We  note  that  the  system  (1.1)  has  a  unique  —  meaning  one  and  only 
one  —  solution,  namely  (1.5). 

And  that,  barring  a  few  minor  complications  that  can  crop  up  from  time  to  time,  is  all 
that  there  is  to  the  method  of  Gaussian  Elimination!  It  is  extraordinarily  simple,  but  its 
importance  cannot  be  overemphasized.  Before  exploring  the  relevant  issues,  it  will  help  to 
reformulate  our  method  in  a  more  convenient  matrix  notation. 


Exercises 


1.1.1.  Solve  the  following  systems  of  linear  equations  by  reducing  to  triangular  form  and  then 
using  Back  Substitution. 

„  a  ,  ,  p  +  q-r  =  0, 

,  ,  x  —  y  =  7,  £>u  +  v  =  b, 

(a)  _  (b)  (c)  2p-q  +  3r  =  3,  (d) 

x  +  2y  =  3;  3 u  -  zr  =  5; 

—  p  —  q  =  6; 


bx1  +  3x2  —  x3  =  9, 
(e)  3x1  +  2x2  —  x3  =  5, 
x1  +  x2  +  x3  =  —  1; 


(f) 


x  +  z  —  2w  =  —3. 
2x  —  y  -\-  2 z  —  w  =  —5. 
—  try  —  Az  +  2iu  =  2, 
x  3y  -\-  2 z  —  w  =  1; 


(g) 


2u  —  v  +  2w  =  2, 
-u  —  v  +  3  w  =  1, 
3  u  —  2  w  =  1; 
3aq  +  x2  =  1, 
x1  +  3x2  +  £3  =  1, 
x2  +  3  £3  +  x4  =  1, 


x3  +  3x4  =  1. 

1.1.2.  How  should  the  coefficients  a,  6,  and  c  be  chosen  so  that  the  system  ax  +  by  +  cz  =  3. 
ax  —  y  +  cz  =  1,  x  +  by  —  cz  =  2,  has  the  solution  x  =  1,  y  =  2  and  z  =  —1? 


1.1.3.  The  system  2x  =  —  6,  —  4x  +  3y  =  3,  x  +  Ay  —  z  =  7,  is  in  lower  triangular  form. 

(a)  Formulate  a  method  of  Forward  Substitution  to  solve  it.  (b)  What  happens  if  you 
reduce  the  system  to  (upper)  triangular  form  using  the  algorithm  in  this  section? 

(c)  Devise  an  algorithm  that  uses  our  linear  system  operation  to  reduce  a  system  to  lower 
triangular  form  and  then  solve  it  by  Forward  Substitution,  (d)  Check  your  algorithm  by 
applying  it  to  one  or  two  of  the  systems  in  Exercise  1.1.1.  Are  you  able  to  solve  them  in  all 
cases? 


1.2  Matrices  and  Vectors 


A  matrix  is  a  rectangular  array  of  numbers.  Thus, 
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are  all  examples  of  matrices.  We  use  the  notation 

/  a 

A  = 


a 


'll 

21 


a 

a 


12 

22 


aln  \ 
a2n 


(1.6) 


a„ 


a 


mn 


/ 


'ml  m2  •  • 

for  a  general  matrix  of  size  mxn  (read  “m  by  n”),  where  m  denotes  the  number  of  rows  in 
A  and  n  denotes  the  number  of  columns.  Thus,  the  preceding  examples  of  matrices  have 
respective  sizes  2x3,  4x2,  1x3,  2x1,  and  2  x  2.  A  matrix  is  square  if  m  =  n,  i.e.,  it 
has  the  same  number  of  rows  as  columns.  A  column  vector  is  an  m  x  1  matrix,  while  a  row 
vector  is  a  1  x  n  matrix.  As  we  shall  see,  column  vectors  are  by  far  the  more  important 
of  the  two,  and  the  term  “vector”  without  qualification  will  always  mean  “column  vector” . 
A  1  x  1  matrix,  which  has  but  a  single  entry,  is  both  a  row  and  a  column  vector. 

The  number  that  lies  in  the  zth  row  and  the  jth  column  of  A  is  called  the  (i,j)  entry 
of  A,  and  is  denoted  by  atJ.  The  row  index  always  appears  first  and  the  column  index 

second.  ^  Two  matrices  are  equal,  A  =  B,  if  and  only  if  they  have  the  same  size,  say  mxn, 
and  all  their  entries  are  the  same:  atJ  =  bi  -  for  i  =  1, . . . ,  m  and  j  =  1, . . . ,  n. 

A  general  linear  system  of  m  equations  in  n  unknowns  will  take  the  form 

T  Ui  o  x0  T  *  *  *  T  cqn  xn  —  b 


a  1 1  x  i 
a2i  x1 


T  a 


'12  X2 
22  X2 


4  ’ 


+ 


+  a2nXn  ~  4) 


(1.7) 


amlXl  +  am2X2  + 


n  t  =  h 
1  mn  n  rri  ‘ 


As  such,  it  is  composed  of  three  basic  ingredients:  the  mxn  coefficient  matrix  A,  with 

tx  A 


entries  ai-  as  in  (1.6),  the  column  vector  x  = 


(\\ 

b. , 


x< 


\  Xn  J 


containing  the  unknowns ,  and 


the  column  vector  b  = 


containing  right-hand  sides.  In  our  previous  example, 


\  ®rri  ) 


x  -\-  2y  -\-  z  —  2,  /  i  2  1 

2x  +  6i/  +  x  =  7,  the  coefficient  matrix  A  —  2  6  1  |  can  be  filled  in,  entry  by  entry, 

x  +  i/  +  4x  =  3,  V 1  1  4 

from  the  coefficients  of  the  variables  appearing  in  the  equations;  if  a  variable  does  not 

x 

appear  in  an  equation,  the  corresponding  matrix  entry  is  0.  The  vector  x  =  (  y  |  lists 

the  variables,  while  the  entries  of  b  =  (  7  |  are  the  right-hand  sides  of  the  equations. 

3 


^  In  tensor  analysis,  [1],  a  sub-  and  super-script  notation  is  adopted,  with  a)  denoting  the  (i,j) 
entry  of  the  matrix  A.  This  has  certain  advantages,  but,  to  avoid  possible  confusion  with  powers, 
we  shall  stick  with  the  simpler  subscript  notation  throughout  this  text. 
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Remark.  We  will  consistently  use  bold  face  lower  case  letters  to  denote  vectors,  and 
ordinary  capital  letters  to  denote  general  matrices. 


Exercises 


/ 

-2 

0 

1 

3  \ 

1.2.1.  Let  A  = 

-1 

2 

7 

-5 

.  (a)  What  is  the  size  of  A?  ( b )  What  is  its  (2,3)  entry? 

V 

6 

-6 

-3 

4  / 

(c)  (3, 1)  entry?  (d)  1st  row?  (e)  2nd  column? 

1.2.2.  Write  down  examples  of  (a)  a  3  x  3  matrix;  (b)  a  2  x  3  matrix;  (c)  a  matrix  with  3  rows 
and  4  columns;  (d)  a  row  vector  with  4  entries;  (e)  a  column  vector  with  3  entries; 

(f)  a  matrix  that  is  both  a  row  vector  and  a  column  vector. 


1.2.3.  For  which  values  of  x:y,  z,w  are  the  matrices 


x  +  y  x  —  z 
y  +  w  x  +  2w 


and 


1  0 
2  1 


equal? 


1.2.4.  For  each  of  the  systems  in  Exercise  1.1.1,  write  down  the  coefficient  matrix  A  and  the 
vectors  x  and  b. 

1.2.5.  Write  out  and  solve  the  linear  systems  corresponding  to  the  indicated  matrix,  vector  of 


unknowns,  and  right-hand  side,  (a)  A  = 


(b)  A  = 


1 

3 


x 


(1  0  1\ 

(  u ^ 

/-1\ 

110 

,  X  = 

V 

,  b  = 

-1 

\0  1  l) 

{ w) 

l  2  / 

/ 


X 


X 
X , 


b  = 


\x: 


( d )  A  = 


(c)  A  = 


/  1  1-1  -1\ 
-10  12 
1-110 
VO  2-1  1) 


Matrix  Arithmetic 

Matrix  arithmetic  involves  three  basic  operations:  matrix  addition ,  scalar  multiplication , 
and  matrix  multiplication.  First  we  define  addition  of  matrices.  You  are  allowed  to  add 
two  matrices  only  if  they  are  of  the  same  size ,  and  matrix  addition  is  performed  entry  by 
entry.  For  example, 


Therefore,  if  A  and  B  are  m  x  n  matrices,  their  sum  C  =  A  +  B  is  the  m  x  n  matrix  whose 
entries  are  given  by  ctJ  =  atJ  +  b-  for  i  —  1, . . . ,  m  and  j  =  1, . . . ,  n.  When  defined,  matrix 
addition  is  commutative,  A  +  B  —  B  +  A,  and  associative,  A  +  (B  +  C)  =  (A  +  B)  +  (7, 
just  like  ordinary  addition. 

A  scalar  is  a  fancy  name  for  an  ordinary  number  —  the  term  merely  distinguishes  it 
from  a  vector  or  a  matrix.  For  the  time  being,  we  will  restrict  our  attention  to  real  scalars 
and  matrices  with  real  entries,  but  eventually  complex  scalars  and  complex  matrices  must 
be  dealt  with.  We  will  consistently  identify  a  scalar  c  E  R  with  the  lxl  matrix  (c)  in 
which  it  is  the  sole  entry,  and  so  will  omit  the  redundant  parentheses  in  the  latter  case. 
Scalar  multiplication  takes  a  scalar  c  and  an  m  x  n  matrix  A  and  computes  the  m  x  n 
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matrix  B  —  c  A  by  multiplying  each  entry  of  A  by  c.  For  example, 


3 


In  general,  btJ  =  ca i-  for  i  =  1  and  j  =  1 , . . . ,  n.  Basic  properties  of  scalar 

multiplication  are  summarized  at  the  end  of  this  section. 

Finally,  we  define  matrix  multiplication.  First,  the  product  of  a  row  vector  a  and  a 
column  vector  x  having  the  same  number  of  entries  is  the  scalar  or  1  x  1  matrix  defined 
by  the  following  rule: 

(xi\ 

n 

=  a1X1  +  a2X2  +  *  *  *  +  anxn  =  akXk'  (1-8) 

k=  1 

More  generally,  if  A  is  an  m  x  n  matrix  and  B  is  an  n  x  p  matrix,  so  that  the  number  of 
columns  in  A  equals  the  number  of  rows  in  B ,  then  the  matrix  product  C  —  AB  is  defined 
as  the  m  x  p  matrix  whose  (i,  j)  entry  equals  the  vector  product  of  the  ith  row  of  A  and 
the  jth  column  of  B.  Therefore, 

n 

cij  =  Yi  aikhj-  (!-9) 

k=  1 

Note  that  our  restriction  on  the  sizes  of  A  and  B  guarantees  that  the  relevant  row  and 
column  vectors  will  have  the  same  number  of  entries,  and  so  their  product  is  defined. 

For  example,  the  product  of  the  coefficient  matrix  A  and  vector  of  unknowns  x  for  our 
original  system  (1.1)  is  given  by 

/l  2  l\  / x\  /  x  +  2y  +  z\ 

dx  =  j  2  6  1  J  I  y  J  =  [  2 x-\-6y  +  z  J. 

\1  1  4/  \z  J  \  x  +  y  +  Az) 

The  result  is  a  column  vector  whose  entries  reproduce  the  left-hand  sides  of  the  original 
linear  system!  As  a  result,  we  can  rewrite  the  system 


a  x=  (a1  a2  . . .  an) 


X , 


\  nr- 


Ax  =  b  (1.10) 

as  an  equality  between  two  column  vectors.  This  result  is  general;  a  linear  system  (1.7) 
consisting  of  m  equations  in  n  unknowns  can  be  written  in  the  matrix  form  (1.10),  where  A 
is  the  mxn  coefficient  matrix  (1.6),  x  is  the  n  x  1  column  vector  of  unknowns,  and  b  is  the 
m  x  1  column  vector  containing  the  right-hand  sides.  This  is  one  of  the  principal  reasons 
for  the  non-evident  definition  of  matrix  multiplication.  Component-wise  multiplication  of 
matrix  entries  turns  out  to  be  almost  completely  useless  in  applications. 

Now,  the  bad  news.  Matrix  multiplication  is  not  commutative  —  that  is,  BA  is  not 
necessarily  equal  to  AB.  For  example,  BA  may  not  be  defined  even  when  AB  is.  Even  if 
both  are  defined,  they  may  be  different  sized  matrices.  For  example  the  product  s  —  rc 
of  a  row  vector  r,  a  1  x  n  matrix,  and  a  column  vector  c,  an  n  x  1  matrix  with  the  same 
number  of  entries,  is  a  1  x  1  matrix,  or  scalar,  whereas  the  reversed  product  C  —  c  r  is  an 
n  x  n  matrix.  For  instance, 

(1  2 )  (  Q  )  =  3,  whereas  (  ^  )  ( 1  2  )  =  ( 


3  6 
0  0 
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In  computing  the  latter  product,  don’t  forget  that  we  multiply  the  rows  of  the  first  matrix 
by  the  columns  of  the  second,  each  of  which  has  but  a  single  entry.  Moreover,  even  if 
the  matrix  products  AB  and  BA  have  the  same  size,  which  requires  both  A  and  B  to  be 
square  matrices,  we  may  still  have  AB  ^  B A.  For  example, 


1 

3 


On  the  other  hand,  matrix  multiplication  is  associative,  so  A(BC)  =  ( AB)C  whenever 
A  has  size  m  x  n,  B  has  size  n  x  p,  and  C  has  size  p  x  q\  the  result  is  a  matrix  of 
size  m  x  q.  The  proof  of  associativity  is  a  tedious  computation  based  on  the  definition  of 
matrix  multiplication  that,  for  brevity,  we  omhA  Consequently,  the  one  difference  between 
matrix  algebra  and  ordinary  algebra  is  that  you  need  to  be  careful  not  to  change  the  order 
of  multiplicative  factors  without  proper  justification. 

Since  matrix  multiplication  acts  by  multiplying  rows  by  columns,  one  can  compute  the 
columns  in  a  matrix  product  AB  by  multiplying  the  matrix  A  and  the  individual  columns 
of  B.  For  example,  the  two  columns  of  the  matrix  product 


are  obtained  by  multiplying  the  first  matrix  with  the  individual  columns  of  the  second: 


In  general,  if  we  use  bfc  to  denote  the  kth  column  of  B ,  then 

AB  =  A(bx  b2  ...  bp)  =  (Ab1  Ab2  ...  Abp),  (1.11) 

indicating  that  the  kth  column  of  their  matrix  product  is  Abk. 

There  are  two  important  special  matrices.  The  first  is  the  zero  matrix ,  all  of  whose 
entries  are  0.  We  use  OmXn  to  denote  the  m  x  n  zero  matrix,  often  written  as  just  O  if  the 
size  is  clear  from  the  context.  The  zero  matrix  is  the  additive  unit,  soA-bO  =  A  =  0  +  A 
when  O  has  the  same  size  as  A.  In  particular,  we  will  use  a  bold  face  0  to  denote  a  column 
vector  with  all  zero  entries,  i.e.,  0lxn. 

The  role  of  the  multiplicative  unit  is  played  by  the  square  identity  matrix 
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•  0 

1/ 

of  size  n  x  n.  The  entries  along  the  main  diagonal  —  which  runs  from  top  left  to  bottom 
right  —  are  equal  to  1,  while  the  off-diagonal  entries  are  all  0.  As  you  can  check,  if  A  is 


A  much  simpler  —  but  more  abstract  proof  can  be  found  in  Exercise  7.1.45. 
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1  Linear  Algebraic  Systems 


Basic  Matrix  Arithmetic 

Matrix  Addition: 

Commutativity 

A+B = B+A 

Associativity 

(A  +  B)  +  C  =  A  +  (B  +  C) 

Zero  Matrix 

A  +  O  =  A  =  0  +  A 

Additive  Inverse 

A+(-A)  =  O,  -A  =  (-  1)A 

Scalar  Multiplication: 

Associativity 

c(dA)  =  ( cd)A 

Distributivity 

c(A  +  B)  =  ( cA)  +  (cB ) 

(c  +  d)  A  =  (cA)  +  ( dA ) 

Unit  Scalar 

1  A  =  A 

Zero  Scalar 

o 

o 

Matrix  Multiplication: 

Associativity 

(AB)C  =  A(BC) 

Distributivity 

A(B  +  C)  —  AB  +  AC, 

{A  +  B)C  =  AC  +  BC, 

Compatibility 

c  (AB)  =  ( cA)B  =  A(cB) 

Identity  Matrix 

1— I 

HH 

Zero  Matrix 

o 

o 

o 

o 

any  mxn  matrix,  then  Im  A  —  A  —  A  In  .  We  will  sometimes  write  the  preceding  equation 
as  just  l A  =  A  =  A  I,  since  each  matrix  product  is  well-defined  for  exactly  one  size  of 
identity  matrix. 

The  identity  matrix  is  a  particular  example  of  a  diagonal  matrix.  In  general,  a  square 
matrix  A  is  diagonal  if  all  its  off-diagonal  entries  are  zero:  a-  =  0  for  all  i  ^  j.  We  will 
sometimes  write  D  =  diag  (c1? . . . ,  cn)  for  the  n  x  n  diagonal  matrix  with  diagonal  entries 

du  =  ci.  Thus,  diag  (1,3,0)  refers  to  the  diagonal  matrix 
identity  matrix  can  be  written  as 


(1  0  0\ 

I  0  3  0  J  ,  while  the  4x4 

\0  0  0/ 


1 4  =  diag  (1, 1, 1, 1) 


/I  0  0  0\ 
0  10  0 
0  0  10 
\0  0  0  1/ 


Let  us  conclude  this  section  by  summarizing  the  basic  properties  of  matrix  arithmetic. 
In  the  accompanying  table,  A,B,C  are  matrices;  c,  d  are  scalars;  O  is  a  zero  matrix;  and 
I  is  an  identity  matrix.  All  matrices  are  assumed  to  have  the  correct  sizes  so  that  the 
indicated  operations  are  defined. 


Exercises 


1.2.6.  (a)  Write  down  the  5x5  identity  and  zero  matrices, 
their  product.  Does  the  order  of  multiplication  matter? 


(b)  Write  down  their  sum  and 


1.2  Matrices  and  Vectors 
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1.2.7.  Consider  the  matrices  A  = 


B  = 


-6 

4 


0 

2 


3 

1 


(  2 

3  \ 

,  c  = 

-3 

-4 

1 

^  1 

2  ) 

AB ,  (c)  BA, 

(d)  {A+B)C,  (e)  A+BC,  (f)  A+2CB,  (g)  BCB- 1,  (ii)  A2-3A+I,  (i)  (B- 1  )(<?+!)■ 


1.2.8.  Which  of  the  following  pairs  of  matrices  commute  under  matrix  multiplication? 


(a) 


1 

-2 


2 

1 


2  3 
5  0 


(b) 


(3  -1\ 

0  2  , 

\1  4/ 

/  1 
5 


1.2.9.  List  the  diagonal  entries  of  A  = 


9 
V  13 


4 

5 

2 

6 

10 

14 


2 

2 


3 

7 

11 

15 


2 
4 

4\ 
8 
12 
16/ 


(c) 


/ 


V 


3 

-2 

2 


/  2 
1 

V2 


0 

1 

0 


1.2.10.  Write  out  the  following  diagonal  matrices:  (a)  diag  (1,  0,  —1),  (b)  diag  (2,  —2,  3,  —3). 

1.2.11.  True  or  false:  (a)  The  sum  of  two  diagonal  matrices  of  the  same  size  is  a  diagonal 
matrix,  (b)  The  product  is  also  diagonal. 

C  1.2.12.  (a)  Show  that  if  D  =  ^  ^  ^  is  a  2  x  2  diagonal  matrix  with  a  ^  6,  then  the  only 

matrices  that  commute  (under  matrix  multiplication)  with  D  are  other  2x2  diagonal 

matrices,  (b)  What  if  a  =  6?  (c)  Find  all  matrices  that  commute  with  D  = 


/  a 

0 

°\ 

0 

b 

0 

\0 

0 

CJ 

a  7^ 

b 

=  c. 

(e)  Prove  that  a  matrix  A  commutes  with  an  n  x  n  diagonal  matrix  D  with  all  distinct 
diagonal  entries  if  and  only  if  A  is  a  diagonal  matrix. 


1.2.13.  Show  that  the  matrix  products  AB  and  BA  have  the  same  size  if  and  only  if  A  and  B 
are  square  matrices  of  the  same  size. 

1.2.14.  Find  all  matrices  B  that  commute  (under  matrix  multiplication)  with  A  = 

o  o  o 

1.2.15.  (a)  Show  that,  if  A,  B  are  commuting  square  matrices,  then  (A  +  B)  =  A  -f-  2  AB  +  B  . 
(b)  Find  a  pair  of  2  x  2  matrices  A,  B  such  that  ( A  +  B)2  ^  A2  +2  AB  A  B2 . 

1.2.16.  Show  that  if  the  matrices  A  and  B  commute,  then  they  necessarily  are  both  square  and 
the  same  size. 


1.2.17.  Let  A  be  an  m  x  n  matrix.  What  are  the  permissible  sizes  for  the  zero  matrices 
appearing  in  the  identities  A  O  =  O  and  O  A  =  O? 

1.2.18.  Let  A  be  an  m  x  n  matrix  and  let  c  be  a  scalar.  Show  that  if  cA  =  O,  then  either  c  =  0 
or  A  =  O. 

1.2.19.  True  or  false:  If  AB  =  O  then  either  A  =  O  or  B  =  O. 

1.2.20.  True  or  false:  If  A,  B  are  square  matrices  of  the  same  size,  then 

A2  -  B2  =  (. A  +  B){A  -  B). 

1.2.21.  Prove  that  Aw  =  0  for  every  vector  v  (with  the  appropriate  number  of  entries)  if  and 
only  if  A  =  O  is  the  zero  matrix.  Hint:  If  you  are  stuck,  first  try  to  find  a  proof  when  A  is 
a  small  matrix,  e.g.,  of  size  2x2. 

1.2.22.  (a)  Under  what  conditions  is  the  square  A  of  a  matrix  defined?  (b)  Show  that  A  and 
A2  commute,  (c)  How  many  matrix  multiplications  are  needed  to  compute  An? 

o 

1.2.23.  Find  a  nonzero  matrix  A  /  O  such  that  A  =  O. 

0  1.2.24.  Let  A  have  a  row  all  of  whose  entries  are  zero,  (a)  Explain  why  the  product  AB  also 
has  a  zero  row.  (b)  Find  an  example  where  BA  does  not  have  a  zero  row. 
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1.2.25.  (a)  Find  all  solutions  X  =  ^  to  the  matrix  equation  AX  =  I  when 


A  = 


2 

3 


.  ( b )  Find  all  solutions  to  X  A  =  I .  Are  they  the  same? 


1.2.26.  (a)  Find  all  solutions  X 


_  (  x  y 

z  w 


to  the  matrix  equation  AX  =  B  when 


A  = 


0  1 
1  3 


and  B  = 


1  2 
1  1 


.  (b)  Find  all  solutions  to  X  A  =  B.  Are  they  the  same? 


1.2.27.  (a)  Find  all  solutions  X  =  ^  to  the  matrix  equation  A  A  =  X  B  when 


1 

1 


A  =  (  ?  q  J  and  R  =  ^  ^  ^  J .  (b)  Can  you  find  a  pair  of  nonzero  matrices  A  ^  B  such 

that  the  matrix  equation  AX  =  X B  has  a  nonzero  solution  X  /  O? 

1.2.28.  Let  A  be  a  matrix  and  c  a  scalar.  Find  all  solutions  to  the  matrix  equation  cA  =  I. 

0  1-2.29.  Let  e  be  the  1  x  m  row  vector  all  of  whose  entries  are  equal  to  1.  (a)  Show  that  if 
A  is  an  m  x  n  matrix,  then  the  entry  of  the  product  v  =  eA  is  the  j th  column  sum 
of  A,  meaning  the  sum  of  all  the  entries  in  its  row.  (b)  Let  W  denote  the  m  x  m 


matrix  whose  diagonal  entries  are  equal  to 


rn 


m 


and  whose  off-diagonal  entries  are  all 


1 

2 

-i\ 

when  A  = 

2 

1 

3 

-4 

5 

-1/ 

equal  to  —  .  Prove  that  the  column  sums  of  B  =  W  A  are  all  zero,  (c)  Check  both  results 
/  1  2  -1\ 

Remark.  If  the  rows  of  A  represent  experimental  data 

values,  then  the  entries  of  ^eA  represent  the  means  or  averages  of  the  data  values,  while 
B  =  W  A  corresponds  to  data  that  has  been  normalized  to  have  mean  0;  see  Section  8.8. 

O  1.2.30.  The  commutator  of  two  matrices  A,  R,  is  defined  to  be  the  matrix 

C  =  [A,B]  =  AB  -  BA.  (1.12) 

(a)  Explain  why  [  A,  B  ]  is  defined  if  and  only  if  A  and  B  are  square  matrices  of  the 
same  size,  (b)  Show  that  A  and  B  commute  under  matrix  multiplication  if  and  only  if 
[  A,  B  ]  =  O.  (c)  Compute  the  commutator  of  the  following  matrices: 

/  0  -1  0\  /I  0  0 


(0 


1 

1 


0 

1 


2 

2 


1 

0 


(w) 


1 

3 


3 

1 


1  7 
7  1 


(in) 


1 

VO 


0  0 

0  1) 


0  0-1 

Vo  i  o 


(d)  Prove  that  the  commutator  is  (i)  Bilinear :  [  cA  +  dB,C]  =  c[A,C  ] -{- d[B,C] 
and  [A,  cB  -\-  dC]  =  c[A,B]  d[A,C]  for  any  scalars  c,  d ;  (ii)  Skew- symmetric: 
[A,R]  =  -[R,A];  (  in)  satisfies  the  the  Jacobi  identity : 

[[A,B],C]  +  [[C,A},B}  +  [[B,C},A]=  o, 

for  any  square  matrices  A,  R,  C  of  the  same  size. 

Remark.  The  commutator  plays  a  very  important  role  in  geometry,  symmetry,  and 
quantum  mechanics.  See  Section  10.4  as  well  as  [54,  60,  93]  for  further  developments. 


0  1-2.31.  The  trace  of  a  n  x  n  matrix  A  £  MnXn  is  defined  to  be  the  sum  of  its  diagonal  entries: 

(  13  2 

(ii) 


tr  A  =  an  +  a22  +  •  •  •  +  %,n-  (a)  Compute  the  trace  of  (i)  ( 


1  -1 
2  3 


-1 
V  —4 


0  1 

3  -1 


(b)  Prove  that  tr(A  +  B)  =  tr  A  +  trb>.  (c)  Prove  that  tr(AF>)  =  tr(F>A).  (d)  Prove  that 
the  commutator  matrix  C  =  AB  —  B  A  has  zero  trace:  tr  C  =  0.  (e)  Is  part  (c)  valid  if  A 
has  size  m  x  n  and  B  has  size  n  x  ml  (f)  Prove  that  tr  (ARC)  =  tr  (CAB)  =  tr  (BCA). 
On  the  other  hand,  find  an  example  where  tr(ARC)  /  tr(ACR). 
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0  1.2.32.  Prove  that  matrix  multiplication  is  associative:  A(BC)  =  ( AB)C  when  defined. 

0  1.2.33.  Justify  the  following  alternative  formula  for  multiplying  a  matrix  A  and  a  column 
vector  x: 

Jx  =  x1  c1+x2c2  +  •••  +£ncn,  (1.13) 

where  c1? . . . ,  cn  are  the  columns  of  A  and  x1, . . . ,  xn  the  entries  of  x. 

?  1.2.34.  The  basic  definition  of  matrix  multiplication  A  B  tells  us  to  multiply  rows  of  A  by 

columns  of  B.  Remarkably,  if  you  suitably  interpret  the  operation,  you  can  also  compute 
AB  by  multiplying  columns  of  A  by  rows  of  B\  Suppose  4  is  an  mxn  matrix  with  columns 
c1? . . . ,  c  .  Suppose  B  is  an  n  x  p  matrix  with  rows  r1? . . . ,  r  .  Then  we  claim  that 

AB  =  ciri  +c2r2  +  •••  +  c„r„,  (1-14) 

where  each  summand  is  a  matrix  of  size  m  x  p.  (a)  Verify  that  the  particular  case 


1  2 
3  4 


0 

2 


1 

3 


=  (  3  )<°  -*>+  (  4  )(2  3>  = 


0  -1 
0  -3 


+ 


4 

8 


6 

12 


4  5 
8  9 


agrees  with  the  usual  method  for  computing  the  matrix  product,  (b)  Use  this  method  to 

2  5  \ 


compute  the  matrix  products  (i) 


(Hi) 


-2 

3 


1 

2 


1  -2 

1  0 


(ii) 


1  -2 
3  -1 


3 

-1 

(2 

3 

0\ 

-1 

2 

i 

3 

-1 

4 

1 

1 

~5 

^0 

4 

1/ 

,  and  verify  that  you  get  the  same  answer  as  that 


obtained  by  the  traditional  method,  (c)  Explain  why  (1.13)  is  a  special  case  of  (1.14). 
(d)  Prove  that  (1.14)  gives  the  correct  formula  for  the  matrix  product. 


T  1.2.35.  Matrix  polynomials.  Let  p(x)  =  cnxn  +  cn_1xn~1  +  •  •  •  +  c^x  +  c0  be  a  polynomial 
function.  If  A  is  a  square  matrix,  we  define  the  corresponding  matrix  polynomial  p(A)  = 
cn  An  +  cn_1  An~ 1  +  •  •  •  +  c1  A  +  c0  I ;  the  constant  term  becomes  a  scalar  multiple  of  the 
identity  matrix.  For  instance,  if  p(x)  =  x2  —  2x+3,  then  p(A)  =  A2  —  2A+3  I.  (a)  Write  out 
the  matrix  polynomials  p(A ),  q(A)  when  p(x)  =  x  —  3x  +  2,  q(x)  =  2x  +1.  (b)  Evaluate 

(  \  2  \ 

p(A)  and  q(A)  when  A  =  (  ^  ^  1 .  (c)  Show  that  the  matrix  product  p(A)  q(A)  is  the 

matrix  polynomial  corresponding  to  the  product  polynomial  r(x)  =  p(x)q(x).  (d)  True  or 
false :  If  B  =  p(A)  and  C  =  q(A),  then  BC  =  CB.  Check  your  answer  in  the  particular 
case  of  part  (b). 

T  1.2.36.  A  block  matrix  has  the  form  M  =  in  which  A,  B,C,  D  are  matrices  with 

respective  sizes  i  x  k,  i  x  l,  j  x  k,  j  x  l.  (a)  What  is  the  size  of  M?  (b)  Write  out  the 


block  matrix  M  when  A 


1 

3 


B  = 


1  -1 

0  1 


c  = 


( 

-2 

\  iy 


D  = 


(c)  Show  that  if  N  = 


of  M,  then  M  +  N  = 


(d)  Show  that  if  P  = 


P  Q 
R  S 

A  +  P 
CAR 


is  a  block  matrix  whose  blocks  have  the  same  size  as  those 
j  i.e.,  matrix  addition  can  be  done  in  blocks, 
has  blocks  of  a  compatible  size,  the  matrix  product  is 

MP=  (^(jx  +  D^Z  CV  +  Dlu)’  ^xP^n  what  “compatible”  means,  (e)  Writedown 

a  compatible  block  matrix  P  for  the  matrix  M  in  part  (b).  Then  validate  the  block  matrix 
product  identity  of  part  (d)  for  your  chosen  matrices. 


V 

Z 


Y 

W 
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T  1.2.37.  The  matrix  S  is  said  to  be  a  square  root  of  the  matrix  A  if  S'2  =  A.  (a)  Show  that 


5  = 


is  a  square  root  of  the  matrix  A  =  ^  ^  ^  ^  .  Can  you  find  another  square 

root  of  A?  (b)  Explain  why  only  square  matrices  can  have  a  square  root,  (c)  Find  all  real 


1  1 
3  -1 


square  roots  of  the  2x2  identity  matrix  I  = 
real  square  root? 


1  0 
0  1 


.  (d)  Does  —  I  = 


1 

0 


0 

1 


have  a 


1.3  Gaussian  Elimination  —  Regular  Case 


With  the  basic  matrix  arithmetic  operations  in  hand,  let  us  now  return  to  our  primary 
task.  The  goal  is  to  develop  a  systematic  method  for  solving  linear  systems  of  equations. 
While  we  could  continue  to  work  directly  with  the  equations,  matrices  provide  a  convenient 
alternative  that  begins  by  merely  shortening  the  amount  of  writing,  but  ultimately  leads 
to  profound  insight  into  the  structure  of  linear  systems  and  their  solutions. 

We  begin  by  replacing  the  system  (1.7)  by  its  matrix  constituents.  It  is  convenient  to 
ignore  the  vector  of  unknowns,  and  form  the  augmented  matrix 


M  =  (A  |  b)  = 


/  all  a12  •  •  •  aln 

a21  a22  •  *  •  a2n 

^2 

^ml  ^m2  •  •  •  ^ mn 

hm' 

(1.15) 


which  is  an  m  x  (n  +  1)  matrix  obtained  by  tacking  the  right-hand  side  vector  onto  the 
original  coefficient  matrix.  The  extra  vertical  line  is  included  just  to  remind  us  that  the 
last  column  of  this  matrix  plays  a  special  role.  For  example,  the  augmented  matrix  for  the 
system  (1.1),  i.e., 


r  -f  2 y  -\-  z  —  2, 
2x  +  6y  +  2:  =  7, 
x  +  y  +  =  3, 


is 


/ 1  2  1 
M=  2  6  1 

\1  1  4 


(1.16) 


Note  that  one  can  immediately  recover  the  equations  in  the  original  linear  system  from 
the  augmented  matrix.  Since  operations  on  equations  also  affect  their  right-hand  sides, 
keeping  track  of  everything  is  most  easily  done  through  the  augmented  matrix. 

For  the  time  being,  we  will  concentrate  our  efforts  on  linear  systems  that  have  the  same 
number,  n,  of  equations  as  unknowns.  The  associated  coefficient  matrix  A  is  square,  of 
size  n  x  n.  The  corresponding  augmented  matrix  M  =  ( A  |  b)  then  has  size  n  x  (n  +  1). 

The  matrix  operation  that  assumes  the  role  of  Linear  System  Operation  #1  is: 


Elementary  Row  Operation  #1: 


Add  a  scalar  multiple  of  one  row  of  the  augmented  matrix  to  another  row. 


For  example,  if  we  add  —  2  times  the  first  row  of  the  augmented  matrix  (1.16)  to  the  second 
row,  the  result  is  the  row  vector 


-2  (1  2  1  2)  +  (2  6  1  7 )  =  ( 0  2  -1  3). 


The  result  can  be  recognized  as  the  second  row  of  the  modified  augmented  matrix 


V1  1 


(1.17) 
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that  corresponds  to  the  first  equivalent  system  (1.2).  When  elementary  row  operation  #1 
is  performed,  it  is  critical  that  the  result  replaces  the  row  being  added  to  —  not  the  row 
being  multiplied  by  the  scalar.  Notice  that  the  elimination  of  a  variable  in  an  equation 
in  this  case,  the  first  variable  in  the  second  equation  —  amounts  to  making  its  entry  in  the 
coefficient  matrix  equal  to  zero. 

We  shall  call  the  (1,1)  entry  of  the  coefficient  matrix  the  first  pivot.  The  precise 
definition  of  pivot  will  become  clear  as  we  continue;  the  one  key  requirement  is  that  a 
pivot  must  always  be  nonzero.  Eliminating  the  first  variable  x  from  the  second  and  third 
equations  amounts  to  making  all  the  matrix  entries  in  the  column  below  the  pivot  equal  to 
zero.  We  have  already  done  this  with  the  (2, 1)  entry  in  (1.17).  To  make  the  (3, 1)  entry 
equal  to  zero,  we  subtract  (that  is,  add  —1  times)  the  first  row  from  the  last  row.  The 
resulting  augmented  matrix  is 


1 

0 

0 


which  corresponds  to  the  system  (1.3).  The  second  pivot  is  the  (2,  2)  entry  of  this  matrix, 
which  is  2,  and  is  the  coefficient  of  the  second  variable  in  the  second  equation.  Again,  the 
pivot  must  be  nonzero.  We  use  the  elementary  row  operation  of  adding  ^  of  the  second 
row  to  the  third  row  to  make  the  entry  below  the  second  pivot  equal  to  0;  the  result  is  the 
augmented  matrix 


N  = 


that  corresponds  to  the  triangular  system  (1.4).  We  write  the  final  augmented  matrix  as 


N=(U  |  c). 


where 


U  = 


c  = 


The  corresponding  linear  system  has  vector  form 


[/  x  =  c. 


(1.18) 


Its  coefficient  matrix  U  is  upper  triangular ,  which  means  that  all  its  entries  below  the 
main  diagonal  are  zero:  ui-  =0  whenever  i  >  j.  The  three  nonzero  entries  on  its  diagonal, 
1,2,  |,  including  the  last  one  in  the  (3,3)  slot,  are  the  three  pivots.  Once  the  system  has 
been  reduced  to  triangular  form  (1.18),  we  can  easily  solve  it  by  Back  Substitution. 

The  preceding  algorithm  for  solving  a  linear  system  of  n  equations  in  n  unknowns  is 
known  as  regular  Gaussian  Elimination.  A  square  matrix  A  will  be  called  regular 1  if  the 
algorithm  successfully  reduces  it  to  upper  triangular  form  U  with  all  non-zero  pivots  on  the 
diagonal.  In  other  words,  for  regular  matrices,  as  the  algorithm  proceeds,  each  successive 
pivot  appearing  on  the  diagonal  must  be  nonzero;  otherwise,  the  matrix  is  not  regular. 
We  then  use  the  pivot  row  to  make  all  the  entries  lying  in  the  column  below  the  pivot 
equal  to  zero  through  elementary  row  operations.  The  solution  is  found  by  applying  Back 
Substitution  to  the  resulting  triangular  system. 


1  Strangely,  there  is  no  commonly  accepted  term  to  describe  this  kind  of  matrix.  For  lack  of  a 
better  alternative,  we  propose  to  use  the  adjective  “regular”  in  the  sequel. 
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Gaussian  Elimination  —  Regular  Case 


start 

for  j  —  1  to  n 

if  ra  ■  ■  =  0,  stop;  print  “ A  is  not  regular” 
else  for  i  =  j  +  1  to  n 

set  l%3  =  mi3/mj3 

add  —  l-  times  row  j  of  M  to  row  i  of  M 
next  i 
next  j 

end 


Let  us  state  this  algorithm  in  the  form  of  a  program,  written  in  a  general  “pseudocode” 
that  can  be  easily  translated  into  any  specific  language,  e.g.,  C++,  Fortran,  Java, 
Maple,  Mathematic  a,  Matlab.  In  accordance  with  the  usual  programming  conven¬ 
tion,  the  same  letter  M  =  (m-)  will  be  used  to  denote  the  current  augmented  matrix  at 
each  stage  in  the  computation,  keeping  in  mind  that  its  entries  will  change  as  the  algorithm 
progresses.  We  initialize  M  =  (A  |  b).  The  final  output  of  the  program,  assuming  A  is 
regular,  is  the  augmented  matrix  M  =  ( U  |  c),  where  U  is  the  upper  triangular  matrix 
whose  diagonal  entries  are  the  pivots,  while  c  is  the  resulting  vector  of  right-hand  sides  in 
the  triangular  system  (7x  =  c. 


For  completeness,  let  us  include  the  pseudocode  program  for  Back  Substitution.  The 
input  to  this  program  is  the  upper  triangular  matrix  U  and  the  right-hand  side  vector  c  that 
results  from  the  Gaussian  Elimination  pseudocode  program,  which  produces  M  —  (  U  |  c ). 
The  output  of  the  Back  Substitution  program  is  the  solution  vector  x  to  the  triangular 
system  (7x  =  c,  which  is  the  same  as  the  solution  to  the  original  linear  system  Ax  =  b. 


Back  Substitution 


start 


Set  Xn  =  Cn/Unn 

for  i  =  n  —  1  to  1  with  increment 

i+ 1 


set  x a 


1 


un 


ci~ 


E 

3  =  1 


uijxj 


next  j 


end 


Exercises 


1.3.1.  Solve  the  following  linear  systems  by  Gaussian  Elimination,  (a) 


? 
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(*>) 


0 d ) 


6 

3 


1 

■2 


u 

V 


5 

5 


(c) 


2 

1 

2\ 

(  u\ 

f 

3  \ 

-1 

3 

3 

A  = 

-2 

V 

4 

-3 

0) 

w 

1 

1 

-1\ 

\ 

/  0\ 

2 

-1 

3 

q 

= 

3 

•) 

-1 

-1 

3  / 

\r  / 

/ 

\bj 

(2 

-3 

1 

1\ 

/  £  \ 

/-1\ 

(g) 

1 

3 

-1 

-2 

2 

1 

-1 

2 

V 

z 

— 

0 

5 

u 

3 

2 

1  / 

\W ) 

^  3  / 

(0 


1.3.2.  Write  out  the  augmented  matrix  for  the  following  linear  systems.  Then  solve  the  system 
by  first  applying  elementary  row  operations  of  type  #1  to  place  the  augmented  matrix  in 
upper  triangular  form,  followed  by  Back  Substitution. 

x-i  +  7  x0  =4,  3z  —  5  w  =  —1, 

(a)  o  n  o  (b)  „  ’  (c)  2  y 


X 


2x1  —  9x2  =  2. 


p  +  4g  —  2r  =  1, 
(d)  —  2p  —  3r  =  —7, 

3p  —  2q  -\-  2r  =  —1. 


2y  +  2:  =  0. 
82:  =  8. 


2  2:  +  re  =  8. 


£ 


(e) 


£< 


2x3  =  - 
-*4  =  2. 


— 3x2  +  2x3  =  0, 

— 4x1  +  7x4  =  —5. 


Ax  +  5y  +  9  2:  =  —9. 

—  £  +  3y  —  z  -\-  vo  =  —2. 
£  —  y  +  3  2:  —  a;  =  0, 
y  —  z  4  w  =  7, 
4£  —  y  +  2:  =  5. 


U) 


1.3.3.  For  each  of  the  following  augmented  matrices  write  out  the  corresponding  linear  system 
of  equations.  Solve  the  system  by  applying  Gaussian  Elimination  to  the  augmented  matrix. 


(a) 


3 

-4 


2 

3 


2 

1 


(b) 


1 

2 

0 

—3  \ 

.  (c) 

/  2 
-1 

-1 

2 

0 

-1 

0 

0 

0\ 

1 

-1 

2 

1 

—6 

0 

-1 

2 

-1 

1 

V-2 

0 

-3 

1/ 

^  0 

0 

-1 

2 

(V 

1.3.4.  Which  of  the  following  matrices  are  regular?  (a) 


2 

1 


1 

4 


(b) 


0 

3 


-1 

-2 


3  -2  1\ 

(  1  -2  3  \ 

/  1  3  -3  0\ 

(c) 

-1  4  -3 

.  (d) 

-2  4-l),  (e) 

-10-12 

3  3  —6  1 

v  3  -2  5/ 

co 

1—1 

to 

U  U  V/  -L 

\  2  3  — 3  5  / 

1.3.5.  The  techniques  that  are  developed  for  solving  linear  systems  are  also  applicable  to 
systems  with  complex  coefficients,  whose  solutions  may  also  be  complex.  Use  Gaussian 
Elimination  to  solve  the  following  complex  linear  systems. 

i£  +  (1  —  i )z  =  2 i , 

(a)  4  !  o:  (b)  2iy  +  (1  +  i)z  =  2, 

—  £  +  2  iy  +  i  2;  =  1  —  2  i . 

(1+  i )  £  +  iy+(2  +  2i)z  =  0, 

.  .  I  1  —  1  )  £  -f  Z  U  =  1  , 

(c)  ,  (d)  (1  -  i)£  +  2y  +  iz  =  0, 

(3  —  3 i )£  +  iy-l-  (3  —  lli)2?  =  6. 

1.3.6.  (a)  Write  down  an  example  of  a  system  of  5  linear  equations  in  5  unknowns  with  regular 
diagonal  coefficient  matrix,  (b)  Solve  your  system,  (c)  Explain  why  solving  a  system 
whose  coefficient  matrix  is  diagonal  is  very  easy. 

o 

1.3.7.  Find  the  equation  of  the  parabola  y  =  ax  +  bx  +  c  that  goes  through  the  points 
(1,6),  (2,4),  and  (3,0). 

0  1.3.8.  A  linear  system  is  called  homogeneous  if  all  the  right-hand  sides  are  zero,  and  so  takes 
the  matrix  form  Ax  =  0.  Explain  why  the  solution  to  a  homogeneous  system  with  regular 
coefficient  matrix  is  x  =  0. 


—  i  £1  +  (1+  i )  x2  =  —  1, 

(1  —  i )  £1  +£2  =  —  3  i . 

(1  -  i)£  +  2y  =  i, 

—  i  £  +  (1  +  i  )y  =  -1. 
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1.3.9.  Under  what  conditions  do  two  2x2  upper  triangular  matrices  commute? 


1.3.10.  A  matrix  is  called  lower  triangular  if  all  entries  above  the  diagonal  are  zero.  Show  that 
a  matrix  is  both  lower  and  upper  triangular  if  and  only  if  it  is  a  diagonal  matrix. 

0  1.3.11.  A  square  matrix  is  called  strictly  lower  [upper)  triangular  if  all  entries  on  or  above 
(below)  the  main  diagonal  are  0.  (a)  Prove  that  every  square  matrix  can  be  uniquely 
written  as  a  sum  A  =  L  +  D  +  U,  with  L  strictly  lower  triangular,  D  diagonal,  and  U 


strictly  upper  triangular,  (b)  Decompose  A  = 


(  3 

1 

V-2 


1 

4 

0 


in  this  manner. 


0  1-3.12.  A  square  matrix  N  is  called  nilpotent  if  Nk  =  O  for  some  k  >  1. 

/  0  1  2  \ 

(a)  Show  that  N  =  0  0  1  is  nilpotent.  (b)  Show  that  every  strictly  upper  triangular 

\0  0  0/ 

matrix,  as  defined  in  Exercise  1.3.11,  is  nilpotent.  (c)  Find  a  nilpotent  matrix  which  is 
neither  lower  nor  upper  triangular. 


0  1.3.13.  A  square  matrix  W  is  called  unipotent  if  N  =  W  —  I  is  nilpotent,  as  in  Exercise  1.3.12, 

so  (W  —  l)k  =  O  for  some  k  >  1.  (a)  Show  that  every  lower  or  upper  triangular  matrix  is 
unipotent  if  and  only  if  it  is  unitriangular,  meaning  its  diagonal  entries  are  all  equal  to  1. 

(b)  Find  a  unipotent  matrix  which  is  neither  lower  nor  upper  triangular. 

1.3.14.  A  square  matrix  P  is  called  idempotent  if  P 2  =  P.  (a)  Find  all  2  x  2  idempotent  upper 
triangular  matrices,  (b)  Find  all  2  x  2  idempotent  matrices. 


Elementary  Matrices 

A  key  observation  is  that  elementary  row  operations  can,  in  fact,  be  realized  by  matrix 
multiplication.  To  this  end,  we  introduce  the  first  type  of  “elementary  matrix” .  (Later  we 
will  meet  two  other  types  of  elementary  matrix,  corresponding  to  the  other  two  kinds  of 
elementary  row  operation.) 

Definition  1.1.  The  elementary  matrix  associated  with  an  elementary  row  operation  for 
m-rowed  matrices  is  the  m  x  m  matrix  obtained  by  applying  the  row  operation  to  the 
m  x  m  identity  matrix  Im  . 

For  example,  applying  the  elementary  row  operation  that  adds  —  2  times  the  first  row  to 

1  0  0\ 

0  10  results  in  the  corresponding 

0  0  1/ 

that,  if  A  is  any  3-rowed  matrix,  then 

multiplying  E1  A  has  the  same  effect  as  the  given  elementary  row  operation.  For  example, 

/  1  0  0\  (l  2  1\  /I  2  1\ 

-2  1  02  6  11=0  2  -1  , 

\  0  0  1/  \1  1  4/  \1  1  4/ 

which  you  may  recognize  as  the  first  elementary  row  operation  we  used  to  solve  our 


the  second  row  of  the  3x3  identity  matrix  I  = 


elementary  matrix  Ex  — 


We  claim 


1.3  Gaussian  Elimination  —  Regular  Case 
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illustrative  example.  If  we  set 

/  1  0  0\  /  1  0  0\  / 1  0  0\ 

E,  =  1-2  1  0  ,  E2  =  (  0  1  0  ,  Eo  =  (  0  1  0  , 

\  0  0  1/  \-l  01/  \0  \  1/ 


(1.19) 


then  multiplication  by  Ex  will  subtract  twice  the  first  row  from  the  second  row,  multipli¬ 
cation  by  E2  will  subtract  the  first  row  from  the  third  row,  and  multiplication  by  E3  will 
add  \  the  second  row  to  the  third  row  —  precisely  the  row  operations  used  to  place  our 
original  system  in  triangular  form.  Therefore,  performing  them  in  the  correct  order,  we 
conclude  that  when 

/l  2  1 

A  =  U  =  0  2  -1 

\o  0  I 

The  reader  is  urged  to  check  this  by  directly  multiplying  the  indicated  matrices.  Keep  in 
mind  that  the  associative  property  of  matrix  multiplication  allows  us  to  compute  the  above 
matrix  product  in  any  convenient  order: 


A  = 


then 


^3  ^2 


^3  E2E,A  —  E3  ( E2  (E1  A) ) 


((E3E2)E1)A  =  (E3(E2E1))A=(E3E2)(E1A) 


1 


making  sure  that  the  overall  left  to  right  order  of  the  matrices  is  maintained,  since  the 
matrix  products  are  usually  not  commutative. 

In  general,  then,  an  m  x  m  elementary  matrix  E  of  the  first  type  will  have  all  l’s  on  the 
diagonal,  one  nonzero  entry  c  in  some  off-diagonal  position  (i,  j),  with  i  j,  and  all  other 
entries  equal  to  zero.  If  A  is  any  m  x  n  matrix,  then  the  matrix  product  E  A  is  equal  to 
the  matrix  obtained  from  A  by  the  elementary  row  operation  adding  c  times  row  j  to  row 
i.  (Note  that  the  order  of  i  and  j  is  reversed.) 

To  undo  the  operation  of  adding  c  times  row  j  to  row  i,  we  must  perform  the  inverse 
row  operation  that  subtracts  c  (or,  equivalently,  adds  —  c)  times  row  j  from  row  i.  The 
corresponding  inverse  elementary  matrix  again  has  l’s  along  the  diagonal  and  —  c  in  the 
(i,j)  slot.  Let  us  denote  the  inverses  of  the  particular  elementary  matrices  (1.19)  by  Li: 
so  that,  according  to  our  general  rule, 


L 


i 


0  0\ 

10, 

0  1  / 


0  o\ 

10, 

0  1  / 


/ 1  0  0\ 

L3  =  0  10. 

\0  A  1/ 


(1.21) 


Note  that  the  products 

L1E1  =  L2  E2  =  L3E3  =  I  (1.22) 

yield  the  3x3  identity  matrix,  reflecting  the  fact  that  the  matrices  represent  mutually 
inverse  row  operations.  (A  more  thorough  discussion  of  matrix  inverses  will  be  postponed 
until  Section  1.5.) 

The  product  of  the  latter  three  elementary  matrices  (1.21)  is  equal  to 


L  —  L1 L2  L3 


(1.23) 


The  matrix  L  is  called  a  lower  unitriangular  matrix,  where  “lower  triangular”  means  that 
all  the  entries  above  the  main  diagonal  are  0,  while  “uni-”,  which  is  short  for  “unipotent” 
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as  defined  in  Exercise  1.3.13,  imposes  the  requirement  that  all  the  entries  on  the  diag¬ 
onal  are  equal  to  1.  Observe  that  the  entries  of  L  below  the  diagonal  are  the  same  as 
the  corresponding  nonzero  entries  in  the  Li.  This  is  a  general  fact  that  holds  when  the 
lower  triangular  elementary  matrices  are  multiplied  in  the  correct  order.  More  generally, 
the  following  elementary  consequence  of  the  laws  of  matrix  multiplication  will  be  used 
extensively. 

Lemma  1.2.  If  L  and  L  are  lower  triangular  matrices  of  the  same  size,  so  is  their  product 
LL.  If  they  are  both  lower  unitriangular,  so  is  their  product.  Similarly,  if  I/,  U  are  upper 
(uni)triangular  matrices,  so  is  their  product  UU. 


The  LU  Factorization 


We  have  almost  arrived  at  our  first  important  result.  Let  us  compute  the  product  of  the 
matrices  L  and  U  in  (1.20),  (1.23).  Using  associativity  of  matrix  multiplication,  equa¬ 
tions  (1.22),  and  the  basic  property  of  the  identity  matrix  I,  we  conclude  that 

LU  —  (L1L2L^)(E^E2E1A)  =  L1L2[L^E^)E2E1A  =  L1L2 1 E2E1A 

=  L1(L2E2)E1A  =  L1IE1A  =  (L1E1)A  =  I A  =  A. 

In  other  words,  we  have  factored  the  coefficient  matrix  A  =  LU  into  a  product  of  a  lower 
unitriangular  matrix  L  and  an  upper  triangular  matrix  U  with  the  nonzero  pivots  on  its 
main  diagonal.  By  similar  reasoning,  the  same  holds  true  for  any  regular  square  matrix. 

Theorem  1.3.  A  matrix  A  is  regular  if  and  only  if  it  can  be  factored 

A  =  LU ,  (1.24) 

where  L  is  a  lower  unitriangular  matrix,  having  all  l’s  on  the  diagonal,  and  U  is  upper 
triangular  with  nonzero  diagonal  entries,  which  are  the  pivots  of  A.  The  nonzero  off- 
diagonal  entries  ltJ  for  i  >  j  appearing  in  L  prescribe  the  elementary  row  operations  that 
bring  A  into  upper  triangular  form;  namely,  one  subtracts  ltJ  times  row  j  from  row  i  at 
the  appropriate  step  of  the  Gaussian  Elimination  process. 


In  practice,  to  find  the  LU  factorization  of  a  square  matrix  A,  one  applies  the  regular 
Gaussian  Elimination  algorithm  to  reduce  A  to  its  upper  triangular  form  U.  The  entries 
of  L  can  be  filled  in  during  the  course  of  the  calculation  with  the  negatives  of  the  multiples 
used  in  the  elementary  row  operations.  If  the  algorithm  fails  to  be  completed,  which 
happens  whenever  zero  appears  in  any  diagonal  pivot  position,  then  the  original  matrix  is 
not  regular,  and  does  not  have  an  LU  factorization. 

(2  1  l\ 

Let  us  compute  the  L  U  factorization  of  the  matrix  A  —  4  5  2  1. 

\2  -2  0/ 

Applying  the  Gaussian  Elimination  algorithm,  we  begin  by  adding  —2  times  the  first  row 
to  the  second  row,  and  then  adding  —1  times  the  first  row  to  the  third.  The  result  is  the 


Example  1.4. 


matrix  |  0 
0 


1  1 

3  0  1.  The  next  step  adds  the  second  row  to  the  third  row,  leading  to  the 


1 


2  1 


upper  triangular  matrix  U  =  (  0  3 

0  0 


0  I ,  whose  diagonal  entries  are  the  pivots.  The 

1 
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its  entries  lying  below  the 


L  0  0 

corresponding  lower  triangular  matrix  is  L  =  (  2  1  0 

1  -1  1 

main  diagonal  are  the  negatives  of  the  multiples  we  used  during  the  elimination  procedure. 
For  instance,  the  (2, 1)  entry  indicates  that  we  added  —2  times  the  first  row  to  the  second 
row,  and  so  on.  The  reader  might  wish  to  verify  the  resulting  factorization 

2  1  1\  /I  0  0\  /2  1  1 

4  5  2  \  —  A  —  LU  —  \  2  1  0  0  3  0 

2-20/  V 1  —1  1 /  V  0  0  — 1 


Exercises 

1.3.15.  What  elementary  row  operations  do  the  following  matrices  represent?  What  size 


matrices  do  they  apply  to? 

(a)  fn  (b) 


\ 

(10  0\ 

(  1  °  0\ 

).  (c) 

0  1  -5 

,  (d) 

0  10 
-i 

,  (e) 

/ 

V0  0  1  J 

U  0  lj 

0 

0 


0 

1 

0 

0 


0 

0 

1 

0 


0\ 
-3 
0 
1 ) 


1.3.16.  Write  down  the  elementary  matrix  corresponding  to  the  following  row  operations  on 
4x4  matrices:  (a)  Add  the  third  row  to  the  fourth  row.  (b)  Subtract  the  fourth  row 
from  the  third  row.  (c)  Add  3  times  the  last  row  to  the  first  row.  (d)  Subtract  twice  the 
second  row  from  the  fourth  row. 

1.3.17.  Compute  the  product  L^L2L1  of  the  elementary  matrices  (1.21).  Compare  your 
answer  with  (1.23). 

1.3.18.  Determine  the  product  E^E2E1  of  the  elementary  matrices  in  (1.19).  Is  this  the  same 
as  the  product  E1  E2E^1  Which  is  easier  to  predict? 

1.3.19.  (a)  Explain,  using  their  interpretation  as  elementary  row  operations,  why  elementary 

matrices  do  not  generally  commute:  E  E  ^  E  E.  (b)  Which  pairs  of  the  elementary 
matrices  listed  in  (1.19)  commute?  (c)  Can  you  formulate  a  general  rule  that  tells  in 
advance  whether  two  given  elementary  matrices  commute? 

1.3.20.  Determine  which  of  the  following  3x3  matrices  is  (i)  upper  triangular,  (ii)  upper 
unitriangular,  (m)  lower  triangular,  and/or  (iv)  lower  unitriangular: 


(l 

2 

0\ 

(1 

0 

A 

(i 

0 

°\ 

(l 

0 

°\ 

(0 

0 

°\ 

(a) 

0 

3 

2 

( b ) 

0 

1 

0 

(c) 

2 

0 

0 

(d) 

0 

1 

0 

(e) 

0 

3 

1 

^0 

0 

-y 

VO 

0 

i  / 

V o 

3 

y 

Vi 

-4 

1  J 

Vo 

1 

o  J 

1.3.21.  Find  the  LU  factorization  of  the  following  matrices:  (a) 


1 

1 


(*>) 


(  - 1 

1 

-i\ 

(2 

0 

3\ 

(-1 

0 

°\ 

/  1 

0 

-i\ 

( c ) 

1 

1 

i 

,  (d) 

1 

3 

1  ’ 

(e) 

2 

-3 

0 

,  (C 

2 

3 

2 

^-1 

1 

V 

1 

1/ 

V  i 

3 

A 

^"3 

1 

o; 

fe) 


( 


\ 


1 

0 

-1 

0 


0 

2 

3 

1 


-1 

-1 

0 

2 


0\ 
-1 
2 
1  J 


( h ) 


(  1 
-1 
-2 
3 


V 


1.3.22.  Given  the  factorization  A  = 


f 


V 


2 

6 

4 


1 

2 

1 

0 

-1 

4 

-6 


2 

3 

1 

1 


3  \ 
0 

-2 

5/ 


(i) 


(2 
1 
3 

VI 


1 

4 

0 

1 


3 

0 

2 

2 


/ 


V 


1 

3 

2 


0 

1 

4 


1\ 

1 
2 

2/ 

-1 

1 

0 


explain,  without  computing,  which  elementary  row  operations  are  used  to  reduce  A  to 
upper  triangular  form.  Be  careful  to  state  the  order  in  which  they  should  be  applied.  Then 
check  the  correctness  of  your  answer  by  performing  the  elimination. 
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1.3.23.  (a)  Write  down  a  4  x  4  lower  unitriangular  matrix  whose  entries  below  the  diagonal 
are  distinct  nonzero  numbers,  (b)  Explain  which  elementary  row  operation  each  entry 
corresponds  to.  (c)  Indicate  the  order  in  which  the  elementary  row  operations  should  be 
performed  by  labeling  the  entries  1,2,3,.... 


0  1.3.24.  Let  ■  be  distinct  real  numbers.  Find  the  LU  factorization  of  the  following 


Vandermonde  matrices : 


1  \ 

h 

t2 

r4 

t\) 


Can  you  spot  a  pattern?  Test  your  conjecture  with  the  5x5  Vandermonde  matrix. 


1.3.25.  Write  down  the  explicit  requirements  on  its  entries  a-  for  a  square  matrix  A  to  be 

(a)  diagonal,  (b)  upper  triangular,  (c)  upper  unitriangular,  (d)  lower  triangular, 

(e)  lower  unitriangular. 

0  1-3.26.  (a)  Explain  why  the  product  of  two  lower  triangular  matrices  is  lower  triangular. 

(b)  What  can  you  say  concerning  the  diagonal  entries  of  the  product  of  two  lower 
triangular  matrices?  (c)  Explain  why  the  product  of  two  lower  unitriangular  matrices  is 
also  lower  unitriangular. 

1.3.27.  True  or  false:  If  A  has  a  zero  entry  on  its  main  diagonal,  it  is  not  regular. 

1.3.28.  In  general,  how  many  elementary  row  operations  does  one  need  to  perform  in  order  to 
reduce  a  regular  n  x  n  matrix  to  upper  triangular  form? 


1.3.29.  Prove  that  if  A  is  a  regular  2x2  matrix,  then  its  LU  factorization  is  unique.  In  other 
words,  if  A  =  LU  =  LU  where  L,  L  are  lower  unitriangular  and  U,  U  are  upper  triangular, 
then  L  =  L  and  U  =  U.  (The  general  case  appears  in  Proposition  1.30.) 


0  1.3.30.  Prove  directly  that  the  matrix  A 


does  not  have  an  LU  factorization. 


0  1.3.31.  Suppose  A  is  regular,  (a)  Show  that  the  matrix  obtained  by  multiplying  each  column 
of  A  by  the  sign  of  its  pivot  is  also  regular  and,  moreover,  has  all  positive  pivots. 

(b)  Show  that  the  matrix  obtained  by  multiplying  each  row  of  A  by  the  sign  of  its  pivot  is 

also  regular  and  has  all  positive  pivots.  /  —  2  2  1 

(c)  Check  these  results  in  the  particular  case  A  = 


V 


1 

4 


0 

2 


Forward  and  Back  Substitution 

Knowing  the  LU  factorization  of  a  regular  matrix  A  enables  us  to  solve  any  associated 
linear  system  A  x  =  b  in  two  easy  stages: 

(1)  First,  solve  the  lower  triangular  system 

Lc  =  b  (1.25) 

for  the  vector  c  by  Forward  Substitution.  This  is  the  same  as  Back  Substitution,  except 
one  solves  the  equations  for  the  variables  in  the  direct  order  —  from  first  to  last.  Explicitly, 

i  —  1 

ci=V  ci  =  bi  -  for  i  =  2,3, . . .  ,n,  (1.26) 

3  =  1 

noting  that  the  previously  computed  values  of  c1? . . . ,  ci_1  are  used  to  determine  ci. 

(2)  Second,  solve  the  resulting  upper  triangular  system 


[/  x  =  c 


(1.27) 


1.3  Gaussian  Elimination  —  Regular  Case 
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by  Back  Substitution.  The  values  of  the  unknowns 


x 


n 


U 


nn 


for  i  =  n  —  1, . . . ,  2, 1,  (1.28) 


are  successively  computed,  but  now  in  reverse  order.  It  is  worth  pointing  out  that  the 
requirement  that  each  pivot  be  nonzero,  uu  ^  0,  is  essential  here,  as  otherwise  we  would 
not  be  able  to  solve  for  the  corresponding  variable  xi. 

Note  that  the  combined  algorithm  does  indeed  solve  the  original  system,  since  if 


Ux  =  c  and  L  c  =  b,  then  Ax  —  L  U  x  =  Lc  =  b. 


Example  1.5.  With  the  LU  decomposition 

/  2  1  1\  /I  0  0\  /  2  1  1\ 

4  52  =  2  1  0  03  0 

\2  -2  0/  \1  —1  1/  \0  0  — 1/ 


found  in  Example  1.4,  we  can  readily  solve  any  linear  system  with  the  given  coefficient 
matrix  by  Forward  and  Back  Substitution.  For  instance,  to  find  the  solution  to 


we  first  solve  the  lower  triangular  system 


1 

2 

1 


or,  explicitly, 


a  —  1, 

2  a  -\-b  =2, 

a  —  b  c  —  2. 


The  first  equation  says  a  —  1;  substituting  into  the  second,  we  find  6  =  0;  the  final  equation 
yields  c  —  1.  We  then  use  Back  Substitution  to  solve  the  upper  triangular  system 


2x  +  y  +  z  =  1, 
which  is  3 y  —  0, 

-z  =  1. 


We  find  z  =  —1,  then  y  —  0,  and  then  x  —  1,  which  is  indeed  the  solution. 


Thus,  once  we  have  found  the  LU  factorization  of  the  coefficient  matrix  A,  the  Forward 
and  Back  Substitution  processes  quickly  produce  the  solution  to  any  system  Ax  =  b. 
Moreover,  they  can  be  straightforwardly  programmed  on  a  computer.  In  practice,  to  solve 
a  system  from  scratch,  it  is  just  a  matter  of  taste  whether  you  work  directly  with  the 
augmented  matrix,  or  first  determine  the  LU  factorization  of  the  coefficient  matrix,  and 
then  apply  Forward  and  Back  Substitution  to  compute  the  solution. 


Exercises 


1.3.32.  Given  the  LU  factorizations  you  calculated  in  Exercise  1.3.21,  solve  the  associated 
linear  systems  4x  =  b,  where  b  is  the  column  vector  with  all  entries  equal  to  1. 
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1.3.33.  In  each  of  the  following  problems,  find  the  A  =  LU  factorization  of  the  coefficient 
matrix,  and  then  use  Forward  and  Back  Substitution  to  solve  the  corresponding  linear 
systems  Ax  =  for  each  of  the  indicated  right-hand  sides: 


(a)  A  = 


1 

3 


(- 1  1  -1\ 

(  i\ 

(~3\ 

(b)  A  = 

11  1  ,  b,  = 

-i  ,  b2  = 

0 

K- i  i  2; 

^  1/ 

^  2/ 

(c)  A  = 


(9-2  -1\ 

-6  11 
V  2  -1  Oj 


( 2.0  .3  .4  \ 

/ 1  \ 

fo\ 

/0\ 

(d)  A  = 

.3  4.0  .5 

.  W  = 

0 

■  b2  =  1 

,  b,  = 

0 

^  .4  .5  6.0  J 

^0^ 

\o) 

\1  / 

t 


(e)  A  = 


(f)  A  = 


1 
0 

-1 
0 

1 

4 

-8 
V-4 


0 

2 

3 

-1 

-2 

1 

-1 

-1 


-1 

3 

2 

2 

0 

-1 

2 

1 


0\ 
1 
2 

1  J 

2  \ 
1 
1 
2  / 


bi  = 


bi  = 


/  1\ 
0 

-1 

V  i  J 

l i\ 

0 
0 

Vo  J 


b2  = 


b2  = 


/  0\ 
-1 
0 

V  i ) 

(  3  \ 

0 

-1 

V  2  J 


b3  = 


/  2  \ 
3 

-2 
V  1 J 


1.4  Pivoting  and  Permutations 


The  method  of  Gaussian  Elimination  presented  so  far  applies  only  to  regular  matrices. 
But  not  every  square  matrix  is  regular;  a  simple  class  of  examples  is  matrices  whose  upper 
left,  i.e.,  (1,1),  entry  is  zero,  and  so  cannot  serve  as  the  first  pivot.  More  generally,  the 
algorithm  cannot  proceed  whenever  a  zero  entry  appears  in  the  current  pivot  position  on 
the  diagonal.  What  then  to  do?  The  answer  requires  revisiting  the  source  of  the  method. 

Consider,  as  a  specific  example,  the  linear  system 

2  y  T  z  =  2, 

2x  +  6y  +  z  =  7,  (1.29) 

x  +  y  +  4z  =  3. 


The  augmented  coefficient  matrix  is 


/  0  2  1 
2  6  1 
\1  1  4 


In  this  case,  the  (1,1)  entry  is  0,  and  so  is  not  a  legitimate  pivot.  The  problem,  of  course, 
is  that  the  first  variable  x  does  not  appear  in  the  first  equation,  and  so  we  cannot  use  it 
to  eliminate  x  in  the  other  two  equations.  But  this  “problem”  is  actually  a  bonus  —  we 
already  have  an  equation  with  only  two  variables  in  it,  and  so  we  need  to  eliminate  x  from 
only  one  of  the  other  two  equations.  To  be  systematic,  we  rewrite  the  system  in  a  different 
order, 

2  x  +  6y  +  z  =  7, 

2  y  T  z  —  2, 
x  +  y  +  4z  =  3, 


1.4  Pivoting  and  Permutations 


23 


by  interchanging  the  first  two  equations.  In  other  words,  we  employ 


Linear  System  Operation  #2: 


Interchange  two  equations. 


Clearly,  this  operation  does  not  change  the  solution  and  so  produces  an  equivalent  linear 
system.  In  our  case,  the  augmented  coefficient  matrix, 


/  2  6  1 
0  2  1 
\1  1  4 


•> 


can  be  obtained  from  the  original  by  performing  the  second  type  of  row  operation: 


Elementary  Row  Operation  #2: 


Interchange  two  rows  of  the  matrix. 


The  new  nonzero  upper  left  entry,  2,  can  now  serve  as  the  first  pivot,  and  we  may 
continue  to  apply  elementary  row  operations  of  type  #1  to  reduce  our  matrix  to  upper 
triangular  form.  For  this  particular  example,  we  eliminate  the  remaining  nonzero  entry  in 
the  first  column  by  subtracting  ^  the  first  row  from  the  last: 


6 

2 


1 

1 

7 


■2  - 

Z  2 


The  (2,  2)  entry  serves  as  the  next  pivot.  To  eliminate  the  nonzero  entry  below  it,  we  add 
the  second  to  the  third  row: 

1 

1 

9 


6 

2 

0 


We  have  now  placed  the  system  in  upper  triangular  form,  with  the  three  pivots  2,2,  and 
|  along  the  diagonal.  Back  Substitution  produces  the  solution  x  =  |,  ?/  =  |,  z  = 

The  row  interchange  that  is  required  when  a  zero  shows  up  in  the  diagonal  pivot  position 
is  known  as  pivoting.  Later,  in  Section  1.7,  we  will  discuss  practical  reasons  for  pivoting 
even  when  a  diagonal  entry  is  nonzero.  Let  us  distinguish  the  class  of  matrices  that  can  be 
reduced  to  upper  triangular  form  by  Gaussian  Elimination  with  pivoting.  These  matrices 
will  prove  to  be  of  fundamental  importance  throughout  linear  algebra. 


Definition  1.6.  A  square  matrix  is  called  nonsingular  if  it  can  be  reduced  to  upper  tri¬ 
angular  form  with  all  non-zero  elements  on  the  diagonal  —  the  pivots  —  by  elementary 
row  operations  of  types  1  and  2. 


In  contrast,  a  singular  square  matrix  cannot  be  reduced  to  such  upper  triangular  form 
by  such  row  operations,  because  at  some  stage  in  the  elimination  procedure  the  diagonal 
entry  and  all  the  entries  below  it  are  zero.  Every  regular  matrix  is  nonsingular,  but,  as 
we  just  saw,  not  every  nonsingular  matrix  is  regular.  Uniqueness  of  solutions  is  the  key 
defining  characteristic  of  nonsingularity. 

Theorem  1.7.  A  linear  system  Ax  =  b  has  a  unique  solution  for  every  choice  of  right- 
hand  side  b  if  and  only  if  its  coefficient  matrix  A  is  square  and  nonsingular. 

We  are  able  to  prove  the  “if”  part  of  this  theorem,  since  nonsingularity  implies  reduction 
to  an  equivalent  upper  triangular  form  that  has  the  same  solutions  as  the  original  system. 
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The  unique  solution  to  the  system  is  then  found  by  Back  Substitution.  The  “only  if”  part 
will  be  proved  in  Section  1.8. 


The  revised  version  of  the  Gaussian  Elimination  algorithm,  valid  for  all  nonsingular  co¬ 
efficient  matrices,  is  implemented  by  the  accompanying  pseudocode  program.  The  starting 
point  is  the  augmented  matrix  M  —  (A  |  b)  representing  the  linear  system  4x  =  b. 
After  successful  termination  of  the  program,  the  result  is  an  augmented  matrix  in  upper 
triangular  form  M  —  ( U  |  c )  representing  the  equivalent  linear  system  £/ x  =  c.  One  then 
uses  Back  Substitution  to  determine  the  solution  x  to  the  linear  system. 


Gaussian  Elimination  —  Nonsingular  Case 


start 

for  j  —  1  to  n 

if  mk ■  —  0  for  all  k  >  j ,  stop;  print  UA  is  singular” 

if  m  -  —  0  but  mk ■  ^  0  for  some  k>  j ,  switch  rows  k  and  j 

for  i  =  j  +  1  to  n 

set  q.  =  mij/mjj 

add  —lij  times  row  j  to  row  i  of  M 
next  i 
next  j 

end 


Remark.  When  performing  the  algorithm  using  exact  arithmetic,  when  pivoting  is  re¬ 
quired  it  does  not  matter  which  row  k  one  chooses  to  switch  with  row  /,  as  long  as  it 
lies  below  and  the  (fc,j)  entry  is  nonzero.  When  dealing  with  matters  involving  numerical 
precision  and  round  off  errors,  there  are  some  practical  rules  of  thumb  to  be  followed  to 
maintain  accuracy  in  the  intervening  computations.  These  will  be  discussed  in  Section  1.7. 


Exercises 


1.4.1.  Determine  whether  the  following  matrices  are  singular  or  nonsingular: 

f  1  1  3\ 

(c) 


(a) 


0  1 
1  2 


( b ) 


1  2 

4  -8 


( d ) 


2  2  2 
3-11  / 


7  (e) 


fl 
4 


1 

0 

—3  \ 

(0 

-1 

0 

i\ 

(  1 

-2 

0 

2  \ 

2 

-2 

4 

0 

,  (s) 

1 

0 

-1 

0 

,  (h) 

4 

1 

-1 

-1 

1 

-2 

2 

-1 

0 

2 

0 

-2 

-8 

-1 

2 

1 

l  0 

1 

0 

\2 

0 

2 

\  —4 

-1 

1 

2  / 

2 

5 


\7  8  9 


(0 


1.4.2.  Classify  the  following  matrices  as  (i)  regular,  (ii)  nonsingular,  and/or  (Hi)  singular: 

/I  3  -3  0\ 


(a) 


2  1 
1  4 


/ 

3 

-2 

i\ 

1 

-2 

(b) 

-1 

4 

4 

.  (c) 

-2 

4 

V 

2 

2 

5/ 

3 

-1 

-1  0-12 
3-2  6  1 

V  2  -1  3  5/ 


1.4.3.  Find  the  equation  z  =  ax  +  by  +  c  for  the  plane  passing  through  the  three  points 
Pl  =  (o,  2,  — 1),  p2  =  (-2,4,3),  P3  =  (2, -1,-3). 
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1.4.4.  Show  that  a  2  x  2  matrix  A  = 


a 


is  (a)  nonsingular  if  and  only  if  ad  —  be  ^  0. 


(b)  regular  if  and  only  if  ad  —  be  ^  0  and  a  /  0. 


1.4.5.  Solve  the  following  systems  of  equations  by  Gaussian  Elimination: 

x1  —  2x2  +  2x3  =  15,  2x1  —  x2  =  1,  x2  —  x3  =  4, 

(a)  x1  —  2x2  +  x3  =  10,  (b)  —  4^  +  2x2  —  3x3  =  —  8,  (c)  —2x1  —  5x2  =  2, 

2^  —  x2  —  2x3  =  —10.  —  3x2  +  x3  =  5.  xi  +  x3  =  ~&- 

x  —  y  A  z  —  w  =  0,  —  2x  +  2y  —  z  +  w  =  2,  — 3x2  +  2x3  =  0,  x3  —  x4  =  2, 

(d)  (e)  ^ 

— 4x  +  4^/  +  32:  =  5,  x  —  3  y  w  =  4 .  2  x3  =  1,  4x^4_7x4:=:  5. 

1.4.6.  True  or  false:  A  singular  matrix  cannot  be  regular. 

1.4.7.  True  or  false:  A  square  matrix  that  has  a  column  with  all  0  entries  is  singular.  What 
can  you  say  about  a  linear  system  that  has  such  a  coefficient  matrix? 


0  1.4.8.  Explain  why  the  solution  to  the  homogeneous  system  Ax 
coefficient  matrix  is  x  =  0. 


0  with  nonsingular 


1.4.9.  Write  out  the  details  of  the  proof  of  the  “if”  part  of  Theorem  1.7:  if  A  is  nonsingular, 
then  the  linear  system  Ax  =  b  has  a  unique  solution  for  every  b. 


Permutations  and  Permutation  Matrices 

As  with  the  first  type  of  elementary  row  operation,  row  interchanges  can  be  accomplished 
by  multiplication  by  a  second  type  of  elementary  matrix,  which  is  found  by  applying  the 
row  operation  to  the  identity  matrix  of  the  appropriate  size.  For  instance,  interchanging 
rows  1  and  2  of  the  3x3  identity  matrix  produces  the  elementary  interchange  matrix 
(0  1  0\ 

P=  1  0  0  .  The  result  PA  of  multiplying  any  3-rowed  matrix  A  on  the  left  by  P  is 

\0  0  1 ) 

the  same  as  interchanging  the  first  two  rows  of  A.  For  instance, 

/0  1  0\  (\  2  3\  / 4  5  6\ 

1  0  0  4  5  6  =  1  2  3  . 

\0  0  1/  \7  8  9/  \7  8  9/ 

Multiple  row  interchanges  are  accomplished  by  combining  such  elementary  interchange 
matrices.  Each  such  combination  of  row  interchanges  uniquely  corresponds  to  what  is 
called  a  permutation  matrix. 

Definition  1.8.  A  permutation  matrix  is  a  matrix  obtained  from  the  identity  matrix  by 
any  combination  of  row  interchanges. 

In  particular,  applying  a  row  interchange  to  a  permutation  matrix  produces  another 
permutation  matrix.  The  following  result  is  easily  established. 

Lemma  1.9.  A  matrix  P  is  a  permutation  matrix  if  and  only  if  each  row  of  P  contains 
ah  0  entries  except  for  a  single  1,  and,  in  addition,  each  column  of  P  also  contains  ah  0 
entries  except  for  a  single  1. 

In  general,  if,  in  the  permutation  matrix  P,  a  1  appears  in  position  (i,  j),  then  multi¬ 
plication  by  P  will  move  the  jth  row  of  A  into  the  zth  row  of  the  product  PA. 
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Example  1.10.  There  are  six  different  3x3  permutation  matrices,  namely 

/I  0  0\  /  0  1  0\  /  0  0  l\  /0  1  0\  /0  0  l\  /I  0  0\ 

0  1  0  ,  0  0  1,1  0  0,1  0  0,0  1  0,0  0  1. 

\0  01/  \ 1  0  0/  \0  10/  \0  0  1/  \1  00/  \0  1  0/ 

(1.30) 

These  have  the  following  effects:  if  A  is  a  matrix  with  row  vectors  r1?  r2,  r3,  then  multipli¬ 
cation  on  the  left  by  each  of  the  six  permutation  matrices  produces,  respectively, 


Thus,  the  first  permutation  matrix,  which  is  the  identity,  does  nothing  —  the  identity 
permutation.  The  fourth,  fifth,  sixth  represent  row  interchanges.  The  second  and  third  are 
non-elementary  permutations,  and  can  be  realized  by  a  pair  of  successive  row  interchanges. 


In  general,  any  rearrangement  of  a  finite  ordered  collection  of  objects  is  called  a  per¬ 
mutation.  Thus,  the  6  permutation  matrices  (1.30)  produce  the  6  possible  permutations 
(1.31)  of  the  rows  of  a  3  x  3  matrix.  In  general,  if  a  permutation  i r  rearranges  the  integers 
(1, . . . ,  n)  to  form  (7r(l), . . . ,  n (n)),  then  the  corresponding  permutation  matrix  P  —  Pn 
that  maps  row  ri  to  row  will  have  l’s  in  positions  (z,  7r(z))  for  i  =  1, . . . ,  n  and  zeros 
everywhere  else.  For  example,  the  second  permutation  matrix  in  (1.30)  corresponds  to  the 
permutation  with  7r(l)  =  2,  7r(2)  =  3,  7r(3)  =  1.  Keep  in  mind  that  tt(1) , . . . ,  n (n)  is  merely 
a  rearrangement  of  the  integers  1, . . . ,  n,  so  that  1  <  7 v(i)  <  n  and  7 r(z)  7^  7 r(j)  when  i  ^  j. 

An  elementary  combinatorial  argument  proves  that  there  is  a  total  of 

n!  =  n  (n  -  1)  (n  -  2)  •  •  •  3-2-1  (1.32) 

different  permutations  of  (1, . . . ,  n),  and  hence  the  same  number  of  permutation  matrices 
of  size  n  x  n.  Moreover,  the  product  P  —  P1P2  of  any  two  permutation  matrices  is  also  a 
permutation  matrix,  and  corresponds  to  the  composition  of  the  two  permutations,  meaning 
one  permutes  according  to  P2  and  then  permutes  the  result  according  to  P1.  An  important 
point  is  that  multiplication  of  permutation  matrices  is  noncommutative  —  the  order  in 
which  one  permutes  makes  a  difference.  Switching  the  first  and  second  rows,  and  then 
switching  the  second  and  third  rows,  does  not  have  the  same  effect  as  first  switching  the 
second  and  third  rows  and  then  switching  the  first  and  second  rows! 


Exercises 


1.4.10.  Write  down  the  elementary  4x4  permutation  matrix  (a)  Pl  that  permutes  the  second 
and  fourth  rows,  and  (b)  P2  that  permutes  the  first  and  fourth  rows,  (c)  Do  P1  and  P2 
commute?  (d)  Explain  what  the  matrix  products  P1  P2  and  P2  P1  do  to  a  4  x  4  matrix. 


1.4.11.  Write  down  the  permutation  matrix  P  such  that 

fd\ 


(a)  P 


/  u\ 

V 

\WJ 


(b)  p 


(  a\ 
b 

c 

\dj 


c 
a 

\  bj 


(c)  P 


/  a\ 
b 

c 

\dj 


/  b\ 
a 
d 

W 


(d)  P 


(  x\  \ 

x2 

XA 


Vx5/ 

1.4.12.  Construct  a  multiplication  table  that  shows  all  possible  products  of  the  3x3 
permutation  matrices  (1.30).  List  all  pairs  that  commute. 


(x  4\ 

x1 

x3 
x2 

Vx5/ 
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1.4.13.  Write  down  all  4  x  4  permutation  matrices  that  (a)  fix  the  third  row  of  a  4  x  4  matrix 
A;  ( b )  take  the  third  row  to  the  fourth  row;  (c)  interchange  the  second  and  third  rows. 

1.4.14.  True  or  false:  (a)  Every  elementary  permutation  matrix  satisfies  P  =  I.  (b)  Every 

permutation  matrix  satisfies  P 2  =  I .  (c)  A  matrix  that  satisfies  P 2  =  I  is  necessarily  a 
permutation  matrix. 

1.4.15.  (a)  Let  P  and  Q  be  n  x  n  permutation  matrices  and  v  £  IRn  a  vector.  Under  what 
conditions  does  the  equation  Pv  =  Qv  imply  that  P  =  Q?  (b)  Answer  the  same  question 
when  PA  =  QA,  where  A  is  an  n  x  k  matrix. 


1.4.16.  Let  P  be  the  3x3  permutation  matrix  such  that  the  product  PA  permutes  the  first 
and  third  rows  of  the  3x3  matrix  A.  (a)  Write  down  P.  ( b )  True  or  false:  The  product 
AP  is  obtained  by  permuting  the  first  and  third  columns  of  A. 

(c)  Does  the  same  conclusion  hold  for  every  permutation  matrix:  is  the  effect  of  PA  on  the 
rows  of  a  square  matrix  A  the  same  as  the  effect  of  A  P  on  the  columns  of  A? 


T  1.4.17.  A  common  notation  for  a  permutation  i r  of  the  integers  {1, . . . ,  m}  is  as  a  2  x  m 

matrix  f  V  x  ' ' '  J71  N  ),  indicating  that  i r  takes  i  to  i r(i).  (a)  Show 

y  7r(l)  7 r(2)  7t(3)  ...  7r  (m)  J 

that  such  a  permutation  corresponds  to  the  permutation  matrix  with  l’s  in  positions 
(r(j)ij)  f°r  3  =  1?  ■  •  •  (t>)  Write  down  the  permutation  matrices  corresponding  to 


the  following  permutations:  (i) 


1 

2 


2 

1 


3 

3 


(li) 


1 

4 


2 

2 


3 

3 


4 

1 


(in) 


1 

1 


2 

4 


3 

2 


4 

3 


O) 


1 

5 


2 

4 


3 

3 


4 

2 


5 

1 


.  Which  are  elementary  matrices?  (c)  Write  down,  using  the 


preceding  notation,  the  permutations  corresponding  to  the  following  permutation  matrices: 


(0 


(o 

i 

Vo 


0 

0 

1 


1\ 

0 

0  J 


(0 

0 

1 

0\ 

(  0 

1 

0 

0\ 

>  (**) 

0 

0 

0 

1 

,  (in) 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 

Vo 

1 

0 

0^ 

Vi 

0 

0 

0  ) 

O) 


/O 
1 
0 
0 
VO 
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0 

0 

0 

1 


0 

0 

1 

0 

0 


1 

0 

0 

0 

0 


0\ 
0 
0 
1 
0/ 


0  1.4.18.  Justify  the  statement  that  there  are  n  \  different  n  x  n  permutation  matrices. 


1.4.19.  Consider  the  following  combination  of  elementary  row  operations  of  type  ffl:  ( i )  Add 
row  i  to  row  j.  (ii)  Subtract  row  j  from  row  i.  (in)  Add  row  i  to  row  j  again.  Prove  that 
the  net  effect  is  to  interchange  —1  times  row  i  with  row  j.  Thus,  we  can  almost  produce 
an  elementary  row  operation  of  type  #2  by  a  combination  of  elementary  row  operations 
of  type  ff  1.  Lest  you  be  tempted  to  try,  Exercise  1.9.16  proves  that  one  cannot  produce  a 
bona  fide  row  interchange  by  a  combination  of  elementary  row  operations  of  type  ff  1. 


1.4.20.  What  is  the  effect  of  permuting  the  columns  of  its  coefficient  matrix  on  a  linear  system? 


The  Permuted  LU  Factorization 

As  we  now  know,  every  nonsingular  matrix  A  can  be  reduced  to  upper  triangular  form 
by  elementary  row  operations  of  types  ffl  and  ff2.  The  row  interchanges  merely  reorder 
the  equations.  If  one  performs  all  of  the  required  row  interchanges  in  advance,  then  the 
elimination  algorithm  can  proceed  without  requiring  any  further  pivoting.  Thus,  the  matrix 
obtained  by  permuting  the  rows  of  A  in  the  prescribed  manner  is  regular.  In  other  words, 
if  A  is  a  nonsingular  matrix,  then  there  is  a  permutation  matrix  P  such  that  the  product 
PA  is  regular,  and  hence  admits  an  LU  factorization.  As  a  result,  we  deduce  the  general 
permuted  L  U  factorization 


PA  =  LU, 


(1.33) 
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where  P  is  a  permutation  matrix,  L  is  lower  unitriangular,  and  U  is  upper  triangular  with 
the  pivots  on  the  diagonal.  For  instance,  in  the  preceding  example,  we  permuted  the  first 
and  second  rows,  and  hence  equation  (1.33)  has  the  explicit  form 

/0  1  0  X/0  2  1\  /  1  0  0\  /2  6  l\ 

1  0  0  2  6  1  =  0  1  0  0  2  1. 

\o  0  1  A1  1  4/  U  -1  V  V°  0  1/ 

We  have  now  established  the  following  generalization  of  Theorem  1.3. 

Theorem  1.11.  Let  A  be  an  n  x  n  matrix.  Then  the  following  conditions  are  equivalent: 

(i)  A  is  nonsingular. 

(ii)  A  has  n  nonzero  pivots. 

(in)  A  admits  a  permuted  LU  factorization:  PA  —  LU . 

A  practical  method  to  construct  a  permuted  L  U  factorization  of  a  given  matrix  A  would 
proceed  as  follows.  First  set  up  P  =  L  =  I  as  n  x  n  identity  matrices.  The  matrix  P 
will  keep  track  of  the  permutations  performed  during  the  Gaussian  Elimination  process, 
while  the  entries  of  L  below  the  diagonal  are  gradually  replaced  by  the  negatives  of  the 
multiples  used  in  the  corresponding  row  operations  of  type  #1.  Each  time  two  rows  of  A  are 
interchanged,  the  same  two  rows  of  P  will  be  interchanged.  Moreover,  any  pair  of  entries 
that  both  he  below  the  diagonal  in  these  same  two  rows  of  L  must  also  be  interchanged, 
while  entries  lying  on  and  above  its  diagonal  need  to  stay  in  their  place.  At  a  successful 
conclusion  to  the  procedure,  A  will  have  been  converted  into  the  upper  triangular  matrix 
[/,  while  L  and  P  will  assume  their  final  form.  Here  is  an  illustrative  example. 

Example  1.12.  Our  goal  is  to  produce  a  permuted  LU  factorization  of  the  matrix 

/  1  2-1  0\ 

4  (  2  4-2-1 

-3-5  6  1  ' 

V-l  2  8  -2/ 

To  begin  the  procedure,  we  apply  row  operations  of  type  #1  to  eliminate  the  entries  below 
the  first  pivot.  The  updated  matrices^  are 


/I 

2 

-i 

°\ 

( 

1 

0 

0 

°\ 

f1 

0 

0 

°\ 

0 

0 

0 

-1 

,  L  = 

2 

1 

0 

0 

,  p  = 

0 

1 

0 

0 

0 

1 

3 

1 

-3 

0 

1 

0 

0 

0 

1 

0 

Vo 

4 

7 

-2/ 

-1 

0 

0 

1/ 

Vo 

0 

0 

1/ 

where  L  keeps  track  of  the  row  operations,  and  we  initialize  P  to  be  the  identity  matrix. 
The  (2,  2)  entry  of  the  new  A  is  zero,  and  so  we  interchange  its  second  and  third  rows, 
leading  to 


/I 

2 

-1 

°\ 

( 

1 

0 

0 

°\ 

A 

0 

0 

°\ 

0 

1 

3 

1 

,  L  = 

-3 

1 

0 

0 

,  p  = 

0 

0 

1 

0 

0 

0 

0 

-1 

2 

0 

1 

0 

0 

1 

0 

0 

Vo 

4 

7 

-2/ 

-1 

0 

0 

1/ 

Vo 

0 

0 

1/ 

1  Here,  we  are  adopting  computer  programming  conventions,  where  updates  of  a  matrix  are  all 
given  the  same  name. 
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We  interchanged  the  same  two  rows  of  P,  while  in  L  we  only  interchanged  the  already 
computed  entries  in  its  second  and  third  rows  that  lie  in  its  first  column  below  the  diagonal. 
We  then  eliminate  the  nonzero  entry  lying  below  the  (2,  2)  pivot,  leading  to 

0  0  0\ 

0  1  0 

10  0' 

0  0  1/ 

A  final  row  interchange  places  the  matrix  in  upper  triangular  form: 


A  = 


f1 

2 

-i 

°\ 

( 

1 

0 

0 

°\  / 

0 

1 

3 

1 

,  L  = 

-3 

1 

0 

0 

,  p  = 

l 

0 

0 

0 

-1 

i 

2 

0 

1 

0 

1 

0 
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U  =  A  = 


u 

2 

-1 

°\ 

1 

0 

1 

3 

1 

T  _ 

-3 

0 

0 

-5 

—6 

>  L  — 

-1 
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2 

0 
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°\ 

(l 

0 
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1 

0 

0 
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0 
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0 

4 

1 

0 
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0 
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1/ 

Vo 

1 

0 

0/ 

Again,  we  performed  the  same  row  interchange  on  P,  while  interchanging  only  the  third 
and  fourth  row  entries  of  L  that  lie  below  the  diagonal.  You  can  verify  that 
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-1 

°\ 
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(l 

2 

-1 

°\ 

-3 

-5 

6 

1 

-3 

1 

0 
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3 

1 

-1 

2 

8 

-2 

-1 

4 

1 
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0 

0 

-5 
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-2 
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2 

0 
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1/ 

Vo 

0 
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-1/ 

(1.34) 


as  promised.  Thus,  by  rearranging  the  equations  in  the  order  first,  third,  fourth,  second, 
as  prescribed  by  P,  we  obtain  an  equivalent  linear  system  whose  coefficient  matrix  PA  is 
regular,  in  accordance  with  Theorem  1.11. 

Once  the  permuted  LU  factorization  is  established,  the  solution  to  the  original  system 
Ax  =  b  is  obtained  by  applying  the  same  Forward  and  Back  Substitution  algorithm 
presented  above.  Explicitly,  we  first  multiply  the  system  Ax  =  b  by  the  permutation 
matrix,  leading  to 

PAx  =  Pb  =  b,  (1.35) 

whose  right-hand  side  b  has  been  obtained  by  permuting  the  entries  of  b  in  the  same 
fashion  as  the  rows  of  A.  We  then  solve  the  two  triangular  systems 

L  c  =  b  and  Px  =  c  (1.36) 

by,  respectively,  Forward  and  Back  Substitution,  as  before. 

Example  1.12  (continued).  Suppose  we  wish  to  solve  the  linear  system 
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0/ 

In  view  of  the  PA  —  LU  factorization  established  in  (1.34),  we  need  only  solve  the  two 
auxiliary  lower  and  upper  triangular  systems  (1.36).  The  lower  triangular  system  is 
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whose  right-hand  side  was  obtained  by  applying  the  permutation  matrix  P  to  the  right- 
hand  side  of  the  original  system.  Its  solution,  namely  a  =  1,  6  =  6,  c  —  —23,  d  =  —3,  is 
obtained  through  Forward  Substitution.  The  resulting  upper  triangular  system  is 
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2 

-1 

°\ 

n 

/  A 

0 

1 

3 

1 

2/ 
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0 

0 

-5 

—6 

z 

-23 

Vo 
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V  w  / 

V  -3/ 

Its  solution,  w  —  3,  z  =  1,  y  —  0,  x  —  2,  which  is  also  the  solution  to  the  original  system, 
is  easily  obtained  by  Back  Substitution. 


Exercises 


1.4.21.  For  each  of  the  listed  matrices  A  and  vectors  b,  find  a  permuted  LU  factorization  of 
the  matrix,  and  use  your  factorization  to  solve  the  system  Ax  =  b.  (a) 
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1.4.22.  For  each  of  the  following  linear  systems  find  a  permuted  LU  factorization  of  the 
coefficient  matrix  and  then  use  it  to  solve  the  system  by  Forward  and  Back  Substitution. 


(a) 


Ax1  —  4x2  +  2x3  =  1, 
— 3x1  +  3x2  +  x3  =  3, 

— 3x-,  +  Xn 


(b) 


2x3  = 


•5. 


y  -  z  +  w  =  0, 

y  +  z  =  1, 

x  —  y  +  z  —  3ie  =  2, 
x  -\-2y  —  z  +  w  =  4. 


(c) 


x  —  y  +  22;  +  re  =  0, 
—  x  A  y  —  3z  =  1, 
x  —  y  +  4  2:  —  3  re  =  2, 
x-\-2y  —  z-\-w  =  A. 


0  1.4.23.  (a)  Explain  why 
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are  all  legitimate  permuted  LU  factorizations  of  the  same  matrix.  List  the  elementary  row 
operations  that  are  being  used  in  each  case. 

/  0  1  3\ 

(b)  Use  each  of  the  factorizations  to  solve  the  linear  system  2  —1  1 

\  2  -2  0/ 

Do  you  always  obtain  the  same  result?  Explain  why  or  why  not. 


(  x^ 

y 

W 


( 


V 


1.4.24.  (a)  Find  three  different  permuted  LU  factorizations  of  the  matrix  A  = 
(b)  How  many  different  permuted  LU  factorizations  does  A  have? 


(0 

1 

Vi 
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1.4.25.  What  is  the  maximal  number  of  permuted  LU  factorizations  a  regular  3x3  matrix  can 
have?  Give  an  example  of  such  a  matrix. 

1.4.26.  True  or  false:  The  pivots  of  a  nonsingular  matrix  are  uniquely  defined. 

4b  1.4.27.  (a)  Write  a  pseudocode  program  implementing  the  algorithm  for  finding  the  permuted 
LU  factorization  of  a  matrix,  (b)  Program  your  algorithm  and  test  it  on  the  examples  in 
Exercise  1.4.21. 


1.5  Matrix  Inverses 

The  inverse  of  a  matrix  is  analogous  to  the  reciprocal  a-1  =  1/a  of  a  nonzero  scalar 
a  /  0.  We  already  encountered  the  inverses  of  matrices  corresponding  to  elementary  row 
operations.  In  this  section,  we  will  study  inverses  of  general  square  matrices.  We  begin 
with  the  formal  definition. 

Definition  1.13.  Let  A  be  a  square  matrix  of  size  nxn.  An  nxn  matrix  X  is  called  the 
inverse  of  A  if  it  satisfies 

14=1=  41,  (1.37) 

where  I  =  In  is  the  nxn  identity  matrix.  The  inverse  of  A  is  commonly  denoted  by  A-1. 


Remark.  Noncommutativity  of  matrix  multiplication  requires  that  we  impose  both  con¬ 
ditions  in  (1.37)  in  order  to  properly  define  an  inverse  to  the  matrix  A.  The  first  condition, 
14  =  I,  says  that  X  is  a  left  inverse ,  while  the  second,  AX  =  I,  requires  that  X  also 
be  a  right  inverse.  Rectangular  matrices  might  have  either  a  left  inverse  or  a  right  inverse, 
but,  as  we  shall  see,  only  square  matrices  have  both,  and  so  only  square  matrices  can  have 
full-fledged  inverses.  However,  not  every  square  matrix  has  an  inverse.  Indeed,  not  every 
scalar  has  an  inverse:  0_1  =  1/0  is  not  defined,  since  the  equation  Ox  =  1  has  no  solution. 


Example  1.14.  Since 


/  1  2  -1\  /  3  4  —5  \  fl  0  0\  /3  4  -5\  /  1  2  -l\ 

-3  1  2  1  1  -1  =  0  1  0  =  1  1  -1-3  1  2  , 

\-2  2  1/  \4  6  -7/  \0  01/  \4  6  -7/  \-2  2  l) 


we  conclude  that  when  A  = 


2 

1 

2 


3  4 

then  A-1  =  I  1  1 


there  is  no  obvious  way  to  anticipate  the  entries  of  A  1  from  the  entries  of  A 


Observe  that 


Example  1.15.  Let  us  compute  the  inverse  X 


2x2  matrix  A  = 


The  right  inverse  condition 


V 

w 


when  it  exists,  of  a  general 


AX 


(  ax  +  b  z  ay  +  bw 
ycx  +  dz  cy  +  dw 


holds  if  and  only  if  x,y,z,w  satisfy  the  linear  system 

ax-\-bz  —  1,  ay  +  bw  —  0, 

cx-\-dz  =  0,  cy dw  —  1. 
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Solving  by  Gaussian  Elimination  (or  directly),  we  find 

d  b 


c 


a 


x  — 


ad  — be1  ^  ad  — be'  ad  — be'  ad  — be' 

provided  the  common  denominator  ad  —  be  ^  0  does  not  vanish.  Therefore,  the  matrix 

1  (  d  -b 

e 


z  — 


w  = 


X  = 


ad  —  be 


a 


XA  = 


=  I 


forms  a  right  inverse  to  A.  However,  a  short  computation  shows  that  it  also  defines  a  left 
inverse: 

xa  +  yc  xb  +  yd\_f  1  0 

za-\-wc  zb  +  wdj  yO  1 

and  hence  X  =  A-1  is  the  inverse  of  A. 

The  denominator  appearing  in  the  preceding  formulas  has  a  special  name;  it  is  called 
the  determinant  of  the  2x2  matrix  A,  and  denoted  by 


det  (  a  ^  )  =  ad  —  be. 
c  d 


(1.38) 


Thus,  the  determinant  of  a  2  x  2  matrix  is  the  product  of  the  diagonal  entries  minus 
the  product  of  the  off-diagonal  entries.  (Determinants  of  larger  square  matrices  will  be 
discussed  in  Section  1.9.)  Thus,  the  2x2  matrix  A  is  invertible,  with 


A-1  = 


1 


ad  —  be 


d 

c 


b 

a 


(1.39) 


if  and  only  if  det  A  ^  0.  For  example,  if  A  = 


-2  -4 


,  then  det  A  =  2^0.  We 


conclude  that  A  has  an  inverse,  which,  by  (1.39),  is  A  1  =  -  (  ^  ^ 


2 

1 


Example  1.16.  We  already  learned  how  to  find  the  inverse  of  an  elementary  matrix  of 
type  #1:  we  just  negate  the  one  nonzero  off-diagonal  entry.  For  example,  if 

1  0  0\  /  1  0  0 

E=  10  1  0  ,  then  FT1  =  0  10 

2  0  1/  2  0  1 

This  is  because  the  inverse  of  the  elementary  row  operation  that  adds  twice  the  first  row 
to  the  third  row  is  the  operation  of  subtracting  twice  the  first  row  from  the  third  row. 

0  1  0 

Example  1.17.  Let  P  =  [  1  0  0  |  denote  the  elementary  matrix  that  has  the  effect 

0  0  1, 

of  interchanging  rows  1  and  2  of  a  3-rowed  matrix.  Then  P2  =  I ,  since  performing  the 
interchange  twice  returns  us  to  where  we  began.  This  implies  that  P~x  —  P  is  its  own 
inverse.  Indeed,  the  same  result  holds  for  all  elementary  permutation  matrices  that  corre¬ 
spond  to  row  operations  of  type  #2.  However,  it  is  not  true  for  more  general  permutation 
matrices. 


The  following  fundamental  result  will  be  established  later  in  this  chapter. 
Theorem  1.18.  A  square  matrix  has  an  inverse  if  and  only  if  it  is  nonsingular. 
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Consequently,  an  n  x  n  matrix  will  have  an  inverse  if  and  only  if  it  can  be  reduced  to 
upper  triangular  form,  with  n  nonzero  pivots  on  the  diagonal,  by  a  combination  of  elemen¬ 
tary  row  operations.  Indeed,  “invertible”  is  often  used  as  a  synonym  for  “nonsingular” .  All 
other  matrices  are  singular  and  do  not  have  an  inverse  as  defined  above.  Before  attempting 
to  prove  Theorem  1.18,  we  need  first  to  become  familiar  with  some  elementary  properties 
of  matrix  inverses. 

Lemma  1.19.  The  inverse  of  a  square  matrix,  if  it  exists,  is  unique. 

Proof :  Suppose  both  X  and  Y  satisfy  (1.37),  so 

XA=I=AX  and  Y  A  =  I  =  AY. 


Then,  by  associativity, 

X  =  Xl  =  X(AY)  =  ( XA)Y  =  I Y  =  Y.  Q.E.D. 

Inverting  a  matrix  twice  brings  us  back  to  where  we  started. 

Lemma  1.20.  If  A  is  an  invertible  matrix,  then  A-1  is  also  invertible  and  (A-1)-1  =  A. 

Proof :  The  matrix  inverse  equations  A-1  A  =  I  =  A  A-1  are  sufficient  to  prove  that  A  is 
the  inverse  of  A-1.  Q.E.D. 


Lemma  1.21.  If  A  and  B  are  invertible  matrices  of  the  same  size,  then  their  product, 
A  B,  is  invertible,  and 

(AB)-1  =  B~x  A~x .  (1.40) 

Note  that  the  order  of  the  factors  is  reversed  under  inversion. 

Proof :  Let  X  =  B~1A~1.  Then,  by  associativity, 

X(AB)  =  B~l A~XAB  =  B~x  IB  =  B~XB  =  I, 

(AB)X  =  ABB-1  A-1  =  A  I  A-1  =  AA-1  =  I. 

Thus  X  is  both  a  left  and  a  right  inverse  for  the  product  matrix  AB.  Q.E.D. 


Example  1.22.  One  verifies,  directly,  that  the  inverse  of  A  = 


A"1  = 


1 

0 


1  2 
0  1 


1 


,  while  the  inverse  of  B  — 


Q  l\  .  _  (0  -1 

-1  o  j  1S  B  1  1  0 


IS 


There¬ 


fore,  the  inverse  of  their  product  C  =  AB  = 


C-1  =  B~LA -1  = 


i  _ 


0 

1 


1 

0 


1 

0 


2 

1 


1  2 
0  1 

0  -1 
1  -2 


0  1 
1  0 


2  1 
1  0 


is  given  by 


We  can  straightforwardly  generalize  the  preceding  result.  The  inverse  of  a  fc-fold  product 
of  invertible  matrices  is  the  product  of  their  inverses,  in  the  reverse  order : 

(A A2  •  •  •  A*-, A,)-1  =  A-1  A-\  ■  ■  ■  A;1  Ap.  (1.41) 

Warning.  In  general,  (A-hB)-1  A-1  +  B~x .  Indeed,  this  equation  is  not  even  true  for 

scalars  (lxl  matrices)! 
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Exercises 


1.5.1.  Verify  by  direct  multiplication  that  the  following  matrices  are  inverses,  i.e.,  both 

conditions  in  (1.37)  hold:  (a)  A  =  ^  ^  3^j,  A-1  =  ^  J  3^;  (b)  A=  3  2  1 


A-1  = 


(  3 

-1 

~i\ 

/- 1 

3 

2\ 

/ 

- 1 

1 

1) 

-4 

2 

i 

;  (0  A  = 

2 

2 

-1 

,  A'1  = 

4 

7 

1 

7 

3 

7 

V-1 

0 

ij 

V-2 

1 

V 

6 

7 

5 

7 

7  7 

/I 


1.5.2.  Let  A  = 


0\ 


0 

VI 


.  Find  the  right  inverse  of  A  by  setting  up  and  solving  the  linear 


2 

1  3 

-1  -8/ 

system  AX  =  I.  Verify  that  the  resulting  matrix  X  is  also  a  left  inverse. 


1.5.3.  Write  down  the  inverse  of  each  of  the  following  elementary  matrices:  (a) 


0 

1 


1 

0 


( b ) 


1  0 
5  1 


>  (c)  ( 


1  -2 

0  1 


\ 

(  1 

0 

0\ 

).  (d) 

0 

1 

-3 

.  (e) 

/ 

Vo 

0 

i/ 

/I 

0 

0 

0\ 

/o 

0 

0 

1\ 

0 

1 

0 

0 

,  (0 

0 

1 

0 

0 

0 

6 

1 

0 

0 

0 

1 

0 

Vo 

0 

0 

1 ) 

Vi 

0 

0 

C# 

1.5.4.  Show  that  the  inverse  of  L  = 


/I 

0 

°\ 

( 

1 

0 

0\ 

a 

1 

0 

is  L  1  = 

—  a 

1 

0 

.  However,  the  inverse 

0 

ij 

K 

-6 

0 

(i 

0 

0\ 

(  i 

0 

0\ 

of  M  = 

a 

1 

0 

is  not 

—  a 

1 

0 

u 

c 

K~b 

—  c 

1/ 

.  What  is  M_1? 


1.5.5.  Explain  why  a  matrix  with  a  row  of  all  zeros  does  not  have  an  inverse. 

1  i  \  (  \ 

1.5.6.  (a)  Write  down  the  inverse  of  the  matrices  A  =  (  ^  ^  1  and  B  = 


1  2  ,  •  (b)  Write 

down  the  product  matrix  C  =  AB  and  its  inverse  C~L  using  the  inverse  product  formula. 


2  1 

i — l 


cos  # 
sin  # 


sin  # 
cos  # 


1.5.7.  (a)  Find  the  inverse  of  the  rotation  matrix  Rq  =  (  A'0  7  ),  where  #  £  R. 

(b)  Use  your  result  to  solve  the  system  x  =  a  cos#  —  6  sin#,  y  =  a  sin#  +  b  cos#,  for  a  and  b 
in  terms  of  x  and  y.  (c)  Prove  that,  for  all  a  £  M  and  0  <  #  <  7r,  the  matrix  P0  —  a  I  has 


an  inverse. 


1.5.8.  (a)  Write  down  the  inverses  of  each  of  the  3x3  permutation  matrices  (1.30).  (b)  Which 
ones  are  their  own  inverses,  P_1  =  PI  (c)  Can  you  find  a  non-elementary  permutation 
matrix  P  that  is  its  own  inverse:  P_1  =  P? 


1.5.9.  Find  the  inverse  of  the  following  permutation  matrices: 


/o 

0 

0 

1\ 

/o 

1 

0 

0\ 

/I 

0 

0 

o\ 

0 

0 

1 

0 

.  (b) 

0 

0 

1 

0 

.  (c) 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

Vi 

0 

0 

o) 

Vi 

0 

0 

0^ 

Vo 

0 

1 

0^ 

(1  0 
0  0 
0  0 
0  1 
Vo  0 


0  0 
1  0 
0  0 
0  0 
0  1 


°\ 

0 

1 

0 

0  / 


1.5.10.  Explain  how  to  write  down  the  inverse  permutation  using  the  notation  of  Exercise 
1.4.17.  Apply  your  method  to  the  examples  in  Exercise  1.5.9,  and  check  the  result  by 
verifying  that  it  produces  the  inverse  permutation  matrix. 

1.5.11.  Find  all  real  2  x  2  matrices  that  are  their  own  inverses:  A  1  =  A. 
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1.5.12.  Show  that  if  a  square  matrix  A  satisfies  A2  —  3  A  +1=0,  then  A  =  31  —  A. 


1.5.13.  Prove  that  if  c  ^  0  is  any  nonzero  scalar  and  A  is  an  invertible  matrix,  then  the  scalar 

product  matrix  cA  is  invertible,  and  (cA)~1  =  —  A-1. 

c 


1.5.14.  Show  that  A  = 


/  0  a 
b  0 
0  d 
0  0 


0  0  0\ 

c  0  0 

0  e  0 

f  o  g 


is  not  invertible  for  any  value  of  the  entries. 


VO  0  0  h  0/ 

1.5.15.  Show  that  if  A  is  a  nonsingular  matrix,  so  is  everyy  power  An . 

1.5.16.  Prove  that  a  diagonal  matrix  D  =  diag  (d1? . . . ,  d  )  is  invertible  if  and  only  if  all  its 
diagonal  entries  are  nonzero,  in  which  case  L>-1  =  diag  (J/d1 , . . . ,  1/d  ). 

1.5.17.  Prove  that  if  U  is  a  nonsingular  upper  triangular  matrix,  then  the  diagonal  entries  of 
U~1  are  the  reciprocals  of  the  diagonal  entries  of  U . 

0  1.5.18.  (a)  Let  U  be  a  m  x  n  matrix  and  V  an  n  x  m  matrix,  such  that  the  m  x  m  matrix 
Im  +  UV  is  invertible.  Prove  that  ln  +  VU  is  also  invertible,  and  is  given  by 

(ln  +  VU)~l  =  ln-V{lm  +  UV)-lU. 

(b)  The  Sherman-M orris  on- Woodbury  formula  generalizes  this  identity  to 

(A  +  VBU)~ 1  =  A^1  -  A~lV{B~l  +  U A~lV)~l  UA _1.  (1.42) 

Explain  what  assumptions  must  be  made  on  the  matrices  A,  B ,  U,  V  for  (1.42)  to  be  valid. 

0  1.5.19.  Two  matrices  A  and  B  are  said  to  be  similar ,  written  A  ~  B,  if  there  exists  an 

invertible  matrix  S  such  that  B  =  A-1  AS.  Prove:  (a)  A  ~  A.  ( b )  If  A  ^  B,  then  B  ~  A. 
(c)  If  A  ~  B  and  B  ~  C ,  then  A  ~  C . 

T  1.5.20.  (a)  A  block  matrix  D  =  is  called  block  diagonal  if  A  and  B  are  square 

matrices,  not  necessarily  of  the  same  size,  while  the  O’s  are  zero  matrices  of  the 
appropriate  sizes.  Prove  that  D  has  an  inverse  if  and  only  if  both  A  and  B  do,  and 


D~l  = 


A^1 

O 


O 


B 


V  (b)  Find  the  inverse  of 


using  this  method. 

1.5.21.  (a)  Show  that  B  =  ^ 


/ 1 

2 

0\ 

2 

1 

0 

and 

Vo 

0 

3) 

1  -1 
2  -1 
0  0 
VO  0 


0  0\ 
0  0 

1  3 

2  5/ 


by 


1  1 
-1  -1 


0 

1 


is  a  left  inverse  of  A  = 


(\  -1 
0  1 


(b)  Show  that 


VI  1 

A  does  not  have  a  right  inverse,  (c)  Can  you  find  any  other  left  inverses  of  A? 


1 

1 


2  -1 

2  0 


has  a  right  inverse,  but  no  left 


1.5.22.  Prove  that  the  rectangular  matrix  A  = 
inverse. 

1.5.23.  (a)  Are  there  any  nonzero  real  scalars  that  satisfy  (a  +  b)~1  =  a-1  +  6_1? 

(b)  Are  there  any  nonsingular  real  2x2  matrices  that  satisfy  (A  A  B)~1  =  A-1  +  b>-1? 


Gauss— Jordan  Elimination 

The  principal  algorithm  used  to  compute  the  inverse  of  a  nonsingular  matrix  is  known  as 
Gauss- Jordan  Elimination ,  in  honor  of  Gauss  and  Wilhelm  Jordan,  a  nineteenth-century 
German  engineer.  A  key  fact  is  that,  given  that  A  is  square,  we  need  to  solve  only  the 
right  inverse  equation 


AX  =  I 


(1.43) 
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in  order  to  compute  X  —  A-1.  The  left  inverse  equation  in  (1.37),  namely  XA  —  I, 
will  then  follow  as  an  automatic  consequence.  In  other  words,  for  square  matrices,  a  right 
inverse  is  automatically  a  left  inverse,  and  conversely!  A  proof  will  appear  below. 

The  reader  may  well  ask,  then,  why  use  both  left  and  right  inverse  conditions  in  the 
original  definition?  There  are  several  good  reasons.  First  of  all,  a  non-square  matrix 
may  satisfy  one  of  the  two  conditions  —  having  either  a  left  inverse  or  a  right  inverse 
but  can  never  satisfy  both.  Moreover,  even  when  we  restrict  our  attention  to  square 
matrices,  starting  with  only  one  of  the  conditions  makes  the  logical  development  of  the 
subject  considerably  more  difficult,  and  not  really  worth  the  extra  effort.  Once  we  have 
established  the  basic  properties  of  the  inverse  of  a  square  matrix,  we  can  then  safely  discard 
the  superfluous  left  inverse  condition.  Finally,  when  we  generalize  the  notion  of  an  inverse 
to  linear  operators  in  Chapter  7,  then,  in  contrast  to  the  case  of  square  matrices,  we  cannot 
dispense  with  either  of  the  conditions. 

Let  us  write  out  the  individual  columns  of  the  right  inverse  equation  (1.43).  The  jth 
column  of  the  n  x  n  identity  matrix  I  is  the  vector  e-  that  has  a  1  in  the  jth  slot  and  0’s 
elsewhere,  so  /1\  /0\  /0\ 


0 

0 


1 

0 


0 

0 


e 


l 


(1.44) 


0 

\0/ 


0 

\0/ 


0 

V 1  / 


According  to  (1.11),  the  jth  column  of  the  matrix  product  AX  is  equal  to  Ax-,  where 
x  •  denotes  the  jth  column  of  the  inverse  matrix  X.  Therefore,  the  single  matrix  equation 
(1.43)  is  equivalent  to  n  linear  systems 

Axj  =  ex,  Ax2  =  e2,  ...  Axn  =  en,  (1.45) 


all  having  the  same  coefficient  matrix.  As  such,  to  solve  them  we  should  form  the  n 
augmented  matrices  M1  —  (A  |  ex ), . . . ,  Mn  —  (A  |  en ),  and  then  apply  our  Gaussian 
Elimination  algorithm  to  each.  But  this  would  be  a  waste  of  effort.  Since  the  coefficient 
matrix  is  the  same,  we  will  end  up  performing  identical  row  operations  on  each  augmented 
matrix.  Clearly,  it  will  be  more  efficient  to  combine  them  into  one  large  augmented  matrix 
M  =  ( A  \  . . .  en  )  =  ( A  |  I  ),  of  size  n  x  (2 n),  in  which  the  right-hand  sides  e1? . . . ,  en 

of  our  systems  are  placed  into  n  different  columns,  which  we  then  recognize  as  reassembling 
the  columns  of  an  n  x  n  identity  matrix.  We  may  then  simultaneously  apply  our  elementary 
row  operations  to  reduce,  if  possible,  the  large  augmented  matrix  so  that  its  first  n  columns 
are  in  upper  triangular  form.  /  ^  ^ 

Example  1.23.  For  example,  to  find  the  inverse  of  the  matrix  A  =  (  2  6  1  b  we 

form  the  large  augmented  matrix 


1  1  4 


(0 

2 

1 

i 

0 

0\ 

2 

6 

1 

0 

1 

0 

V1 

1 

4 

0 

0 

V 

Applying  the  same  sequence  of  elementary  row  operations  as  in  Section  1.4,  we  first  inter¬ 
change  the  rows 


/ 2 

6 

1 

0 

1 

°\ 

0 

2 

1 

1 

0 

0 

\i 

1 

4 

0 

0 

1/ 
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and  then  eliminate  the  nonzero  entries  below  the  first  pivot, 


/  2 

6 

1 

0 

1 

°\ 

0 

2 

1 

l 

0 

0 

\0 

-2 

7 

2 

0 

1 

2 

Next  we  eliminate  the  entry  below  the  second  pivot: 


l 2 

6 

1 

0 

1 

°\ 

0 

2 

1 

1 

0 

0 

\o 

0 

9 

2 

1 

1 

2 

1/ 

At  this  stage,  we  have  reduced  our  augmented  matrix  to  the  form  ( U  \  C ) ,  where  U  is 
upper  triangular.  This  is  equivalent  to  reducing  the  original  n  linear  systems  Ayii  =  ei  to 
n  upper  triangular  systems  U'x.i  =  ci.  We  can  therefore  perform  n  back  substitutions  to 
produce  the  solutions  x?:,  which  would  form  the  individual  columns  of  the  inverse  matrix 
X  =  (x-l  ...  xn).  In  the  more  common  version  of  the  Gauss-Jordan  scheme,  one  instead 
continues  to  employ  elementary  row  operations  to  fully  reduce  the  augmented  matrix.  The 
goal  is  to  produce  an  augmented  matrix  (  I  |  X )  in  which  the  left-hand  n  x  n  matrix  has 
become  the  identity,  while  the  right-hand  matrix  is  the  desired  solution  X  =  A-1.  Indeed, 
(  I  |  X )  represents  the  n  trivial  linear  systems  lx  =  x^  whose  solutions  x  =  x^  are  the 
columns  of  the  inverse  matrix  X. 

Now,  the  identity  matrix  has  0’s  below  the  diagonal,  just  like  U.  It  also  has  l’s  along 
the  diagonal,  whereas  U  has  the  pivots  (which  are  all  nonzero)  along  the  diagonal.  Thus, 
the  next  phase  in  the  reduction  process  is  to  make  all  the  diagonal  entries  of  U  equal  to  1. 
To  proceed,  we  need  to  introduce  the  last,  and  least,  of  our  linear  systems  operations. 


Linear  System  Operation  #3: 


Multiply  an  equation  by  a  nonzero  constant. 


This  operation  clearly  does  not  affect  the  solution,  and  so  yields  an  equivalent  linear  system. 
The  corresponding  elementary  row  operation  is: 


Elementary  Row  Operation  #3: 


Multiply  a  row  of  the  matrix  by  a  nonzero  scalar. 


Dividing  the  rows  of  the  upper  triangular  augmented  matrix  ( U  \  C )  by  the  diagonal 
pivots  of  U  will  produce  a  matrix  of  the  form  ( V  \  B ) ,  where  V  is  upper  unitriangular , 
meaning  it  has  all  l’s  along  the  diagonal.  In  our  particular  example,  the  result  of  these 
three  elementary  row  operations  of  type  #3  is 


(i 

3 

1 

2 

0 

1 

2 

o\ 

0 

1 

1 

2 

1 

2 

0 

0 

\o 

0 

1 

2 

9 

1 

9 

2 

9  ' 

where  we  multiplied  the  first  and  second  rows  by  |  and  the  third  row  by  |. 

We  are  now  over  halfway  towards  our  goal.  We  need  only  make  the  entries  above 
the  diagonal  of  the  left-hand  matrix  equal  to  zero.  This  can  be  done  by  elementary  row 
operations  of  type  #1,  but  now  we  work  backwards.  First,  we  eliminate  the  nonzero  entries 
in  the  third  column  lying  above  the  (3,  3)  entry  by  subtracting  one  half  the  third  row  from 
the  second  and  also  from  the  first: 


3 

0 

1 

'  9 

5 

9 

0 

1 

0 

7 

1 

1 

18 

18 

9 

Vo 

0 

1 

2 

9 

1 

9 

2 

9  ' 
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Finally,  we  subtract  3  times  the  second  row  from  the  first  to  eliminate  the  remaining 
nonzero  off-diagonal  entry,  thereby  completing  the  Gauss-Jordan  procedure: 


1 

0 

0 

23 

18 

7 

18 

—  \ 

9 

0 

1 

0 

7 

1 

1 

18 

18 

9 

\0 

0 

1 

2 

9 

1 

9 

2 

9  / 

The  left-hand  matrix  is  the  identity,  and  therefore  the  final  right-hand  matrix  is  our  desired 
inverse: 


/  _  23  7_  2  \ 

1  S  1  «  o' 


A"1  = 


V 


23 

18 

7_ 

18 

2 

9 


7_ 

18 

J_ 

18 

1 

'9 


9 
1 
9 

2  / 

9  / 


(1.46) 


The  reader  may  wish  to  verify  that  the  final  result  does  satisfy  both  inverse  conditions 

A  A-1  =  I  =A~1A. 


We  are  now  able  to  complete  the  proofs  of  the  basic  results  on  inverse  matrices.  First, 
we  need  to  determine  the  elementary  matrix  corresponding  to  an  elementary  row  operation 
of  type  #3.  Again,  this  is  obtained  by  performing  the  row  operation  in  question  on  the 
identity  matrix.  Thus,  the  elementary  matrix  that  multiplies  row  i  by  the  nonzero  scalar 
c  is  the  diagonal  matrix  having  c  in  the  zth  diagonal  position,  and  l’s  elsewhere  along  the 
diagonal.  The  inverse  elementary  matrix  is  the  diagonal  matrix  with  1/c  in  the  zth  diagonal 
position  and  l’s  elsewhere  on  the  main  diagonal;  it  corresponds  to  the  inverse  operation 
that  divides  row  i  by  c.  For  example,  the  elementary  matrix  that  multiplies  the  second 

/l  0  0\  (l  0  0 

I  0  5  0  1  ;  its  inverse  is  E~x  =  [  0  4  0 

\0  0  1/  \0  0  1 

matrix  is  nonsingular,  and  its  inverse  is  also  an 


row  of  a  3- rowed  matrix  by  5  is  E  = 
In  summary: 

Lemma  1.24.  Every  elementary 
elementary  matrix  of  the  same  type. 


The  Gauss-Jordan  method  tells  us  how  to  reduce  any  nonsingular  square  matrix  A  to 
the  identity  matrix  by  a  sequence  of  elementary  row  operations.  Let  E1:  E2l . . . ,  EN  be 
the  corresponding  elementary  matrices.  The  elimination  procedure  that  reduces  A  to  I 
amounts  to  multiplying  A  by  a  succession  of  elementary  matrices: 

^n^n-i  *■*  E2E1A  =  I.  (1-47) 

We  claim  that  the  product  matrix 

X  =  EnEn_ i  •••  E2E1  (1-48) 

is  the  inverse  of  A.  Indeed,  formula  (1.47)  says  that  AA  =  I,  and  so  X  is  a  left  inverse. 
Furthermore,  each  elementary  matrix  has  an  inverse,  and  so  by  (1.41),  X  itself  is  invertible, 
with 

X-1  =E^E^  ■■■  E-^E-1.  (1.49) 

Therefore,  multiplying  formula  (1.47),  namely  XA  =  I,  on  the  left  by  X~l  leads  to  A  = 
X~x .  Lemma  1.20  implies  X  —  A-1,  as  claimed,  completing  the  proof  of  Theorem  1.18. 
Finally,  equating  A  =  X-1  to  the  product  (1.49),  and  invoking  Lemma  1.24,  we  have 
established  the  following  result. 
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Proposition  1.25.  Every  nonsingular  matrix  can  be  written  as  the  product  of  elementary 
matrices. 


Example  1.26.  The  2x2  matrix  A  = 


0 

1 


1 


is  converted  into  the  identity  matrix 


by  first  interchanging  its  rows, 


0  ^  j,  then  scaling  the  second  row  by  —  1,  ^ 

1  0 


and,  finally,  subtracting  3  times  the  second  row  from  the  first  to  obtain  ^  ^  ~  )  =  I .  The 

corresponding  elementary  matrices  are 

01\  „  (1  0 


E  i  = 


1  0 


E2  -  '  0  -1 


e3  = 


1 

0 


3 

1 


Therefore,  by  (1.48), 


A  —  E 3  E2  E1 


while 


A  =  Ef1  Ef1  E3l  = 


1  -3 

0  1 

0  1 
1  0 


1  0 

0  -1 

1  0 

0  -1 


0  1 
1  0 

1  3 

0  1 


3  1 

-1  0 


0 

1 


1 

3 


As  an  application,  let  us  prove  that  the  inverse  of  a  nonsingular  triangular  matrix  is 
also  triangular.  Specifically: 


Proposition  1.27.  If  L  is  a  lower  triangular  matrix  with  all  nonzero  entries  on  the  main 
diagonal,  then  L  is  nonsingular  and  its  inverse  L-1  is  also  lower  triangular.  In  particular, 
if  L  is  lower  unitriangular,  so  is  L_1.  A  similar  result  holds  for  upper  triangular  matrices. 


Proof:  It  suffices  to  note  that  if  L  has  all  nonzero  diagonal  entries,  one  can  reduce  L  to  the 
identity  by  elementary  row  operations  of  types  ffl  and  jf=  3,  whose  associated  elementary 
matrices  are  all  lower  triangular.  Lemma  1.2  implies  that  the  product  (1.48)  is  then 
also  lower  triangular.  If  L  is  unitriangular,  then  all  the  pivots  are  equal  to  1.  Thus,  no 
elementary  row  operations  of  type  are  required,  and  so  L  can  be  reduced  to  the  identity 
matrix  by  elementary  row  operations  of  type  ffl  alone.  Therefore,  its  inverse  is  a  product 
of  lower  unitriangular  matrices,  and  hence  is  itself  lower  unitriangular.  A  similar  argument 
applies  in  the  upper  triangular  case.  Q.E.D. 


Exercises 


1.5.24.  (a)  Write  down  the  elementary  matrix  that  multiplies  the  third  row  of  a  4  x  4  matrix 
by  7.  (b)  Write  down  its  inverse. 


1.5.25.  Find  the  inverse  of  each  of  the  following  matrices,  if  possible,  by  applying  the  Gauss- 
Jordan  Method. 

1  -2 
3  -3 


(a) 


(b) 


(c) 


(l  2  3\ 

(2  1  2\ 

( f ) 

3  5  5  ,  (g) 

4  2  3 

,  W 

2  1  2 / 

\°  -i  1/ 

4  \ 

f1  2 

3  \ 

(  1 

0 

—2  \ 

5  1 

3  1  5 

( d ) 

4  5 

6 

1 

(e) 

3 

-1 

0 

V 

^  7  8 

9  7 
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1 
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-2 
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1.5.26.  Write  each  of  the  matrices  in  Exercise  1.5.25  as  a  product  of  elementary  matrices. 

/at  ,  \ 


1.5.27.  Express  A  = 


V 


VI 

2 

1 

2 


1 
2 

V3 
2  ) 


as  a  product  of  elementary  matrices. 


1.5.28.  Use  the  Gauss-Jordan  Method  to  find  the  inverse  of  the  following  complex  matrices: 


(a) 


i  1 
1  i 


(b) 


1  1  -  i 

1  +  i  1 


(c) 


/  0  1  -i \ 

i  0  -1 

V-l  i  1/ 


0 d ) 


1.5.29.  Can  two  nonsingular  linear  systems  have  the  same  solution  and  yet  not  be  equivalent? 

G  1.5.30.  (a)  Suppose  A  is  obtained  from  A  by  applying  an  elementary  row  operation.  Let 

C  =  AB,  where  B  is  any  matrix  of  the  appropriate  size.  Explain  why  C  =  AB  can  be 
obtained  by  applying  the  same  elementary  row  operation  to  C.  (b)  Illustrate  by  adding 


—2  times  the  first  row  to  the  third  row  of  A  = 


result  on  the  right  by  B 


(  1 

3 

V-i 


(1 

2 

-i\ 

2 

-3 

2 

and  then  multiplying  the 

\o 

1 

-4y 

0  | .  Check  that  the  resulting  matrix  is  the  same  as  first 
1 


multiplying  AB  and  then  applying  the  same  row  operation  to  the  product  matrix. 


Solving  Linear  Systems  with  the  Inverse 

The  primary  motivation  for  introducing  the  matrix  inverse  is  that  it  provides  a  compact 
formula  for  the  solution  to  any  linear  system  with  an  invertible  coefficient  matrix. 


Theorem  1.28.  If  the  matrix  A  is  nonsingular,  then  x  =  A  1  b  is  the  unique  solution  to 
the  linear  system  Ax  =  b. 


Proof :  We  merely  multiply  the  system  by  A  1 ,  which  yields  x  =  A  1Ax  =  A  ^^b.  More¬ 
over,  4x  =  A  A-1  b  =  b,  proving  that  x  =  A-1b  is  indeed  the  solution.  Q.E.D. 


For  example,  let  us  return  to  the  linear  system  (1.29).  Since  we  computed  the  inverse 


of  its  coefficient  matrix  in  (1.46),  a  “direct 
right-hand  side  by  the  inverse  matrix: 

/  _ 23 

18 

y  ' 


55 
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1_ 

18 
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18 

J_ 
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way  to  solve  the  system  is  to  multiply  the 


2  \ 
9 
1 

9 

2  / 

9  7 


(2\ 

/  —  \ 

6 

7 

= 

5 

6 

\3) 

4/ 

reproducing  our  earlier  solution. 

However,  while  aesthetically  appealing,  the  solution  method  based  on  the  inverse  matrix 
is  hopelessly  inefficient  as  compared  to  direct  Gaussian  Elimination,  and,  despite  what  you 
may  have  been  told,  should  not  be  used  in  practical  computations .  (A  complete  justification 
of  this  dictum  will  be  provided  in  Section  1.7.)  On  the  other  hand,  the  inverse  does  play 
a  useful  role  in  theoretical  developments,  as  well  as  providing  insight  into  the  design  of 
practical  algorithms.  But  the  principal  message  of  applied  linear  algebra  is  that  LU  de¬ 
composition  and  Gaussian  Elimination  are  fundamental;  matrix  inverses  are  to  be  avoided 
in  all  but  the  most  elementary  computations. 


Remark.  The  reader  may  have  learned  a  version  of  the  Gauss-Jordan  algorithm  for 
solving  a  single  linear  system  that  replaces  the  Back  Substitution  step  by  a  complete 
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reduction  of  the  coefficient  matrix  to  the  identity.  In  other  words,  to  solve  fix  =  b,  we 
start  with  the  augmented  matrix  M  —  (A  |  b)  and  use  all  three  types  of  elementary 
row  operations  to  produce  (assuming  nonsingularity)  the  fully  reduced  form  (  I  |  d ) , 
representing  the  trivially  soluble,  equivalent  system  x  =  d,  which  is  the  solution  to  the 
original  system.  However,  Back  Substitution  is  more  efficient,  and  it  remains  the  method 
of  choice  in  practical  computations. 


Exercises 


1.5.31.  Solve  the  following  systems  of  linear  equations  by  computing  the  inverses  of  their 
coefficient  matrices. 

x  —  y  +  3z  =  3,  y-\-5z  =  3, 

(a)X  +  2y  (b)  3U  2V  2’  (c)  x-2y  +  3z  =  -2,  (d)  x  -  y  +  3z  =  -1. 

x-2y  =  -2.  u  +  5v  =  12.  x_2y  +  z  =  2.  -2x  +  3y  =  5. 


x  +  Ay  —  z  =  3, 
(e)  2x-\-7y  —  2^  =  5, 

—  x  —  5y  +  2z  =  —  7. 


(n 


x  +  y  =  4, 

2  x  +  3  y  —  w  =  11, 
—  y  -  z  +  w  =  -7, 
z  —  w  =  6. 


(g) 


x  —  2y  -h  z  -j-  2u  =  —2. 
x  —  y-hz  —  u  =  3, 
2x  —  y  -j-  z  +  u  =  3, 
x  -h  3y  —  2z  —  u  =  2. 


1.5.32.  For  each  of  the  nonsingular  matrices  in  Exercise  1.5.25,  use  your  computed  inverse  to 
solve  the  associated  linear  system  Ax  =  b,  where  b  is  the  column  vector  of  the  appropriate 
size  that  has  all  l’s  as  its  entries. 


The  LBV  Factorization 

The  second  phase  of  the  Gauss-Jordan  process  leads  to  a  slightly  more  detailed  version  of 
the  L  U  factorization.  Let  D  denote  the  diagonal  matrix  having  the  same  diagonal  entries 
as  U\  in  other  words,  D  contains  the  pivots  on  its  diagonal  and  zeros  everywhere  else.  Let 
V  be  the  upper  unitriangular  matrix  obtained  from  U  by  dividing  each  row  by  its  pivot, 
so  that  V  has  all  l’s  on  the  diagonal.  We  already  encountered  V  during  the  course  of 
the  Gauss-Jordan  procedure.  It  is  easily  seen  that  U  —  DV,  which  implies  the  following 
result. 

Theorem  1.29.  A  matrix  A  is  regular  if  and  only  if  it  admits  a  factorization 

A  =  LDV ,  (1.50) 

where  L  is  a  lower  unitriangular  matrix,  D  is  a  diagonal  matrix  with  nonzero  diagonal 
entries,  and  V  is  an  upper  unitriangular  matrix. 


For  the  matrix  appearing  in  Example  1.4,  we  have  U  =  DV,  where 


/  2  0 
D=  0  3 
\0  0 


This  leads  to  the  factorization 


1  l 

2  2 

1  0 

0  1 


1 

2 


1 

0 


2  1  1 

4  5  2 

2-2  0 


2  0  0 

0  3  0 

0  0-1 


=  LBV. 
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Proposition  1.30.  If  A  =  LU  is  regular,  then  the  factors  L  and  U  are  uniquely  deter¬ 
mined.  The  same  holds  for  the  A  =  LDV  factorization. 

Proof :  Suppose  LU  =  LU.  Since  the  diagonal  entries  of  all  four  matrices  are  non-zero, 
Proposition  1.27  implies  that  they  are  invertible.  Therefore, 

L-1L  =  L^LUU-1  =  L~1LUU~1  =  UU-1.  (1.51) 

The  left-hand  side  of  the  matrix  equation  (1.51)  is  the  product  of  two  lower  unitriangular 
matrices,  and  so,  by  Lemma  1.2,  is  itself  lower  unitriangular.  The  right-hand  side  is  the 
product  of  two  upper  triangular  matrices,  and  hence  is  upper  triangular.  But  the  only  way 
a  lower  unitriangular  matrix  can  equal  an  upper  triangular  matrix  is  if  they  both  equal 
the  diagonal  identity  matrix.  Therefore,  L~1L  =  I  =  C/C/-1,  and  so  L  =  L  and  U  —  C/, 
proving  the  first  result.  The  LDV  version  is  an  immediate  consequence.  Q.E.D. 

As  you  may  have  guessed,  the  more  general  cases  requiring  one  or  more  row  interchanges 
lead  to  a  permuted  LDV  factorization  in  the  following  form. 

Theorem  1.31.  A  matrix  A  is  nonsingular  if  and  only  if  there  is  a  permutation  matrix  P 
such  that 

PA  =  LDV,  (1.52) 

where  L  is  a  lower  unitriangular  matrix,  D  is  a  diagonal  matrix  with  nonzero  diagonal 
entries,  and  V  is  a  upper  unitriangular  matrix. 

Uniqueness  does  not  hold  for  the  more  general  permuted  factorizations  (1.33),  (1.52), 
since  there  may  be  several  permutation  matrices  that  place  a  matrix  in  regular  form;  an 
explicit  example  can  be  found  in  Exercise  1.4.23.  Moreover,  in  contrast  to  regular  Gaussian 
Elimination,  here  the  pivots,  i.e.,  the  diagonal  entries  of  C/,  are  no  longer  uniquely  defined, 
but  depend  on  the  particular  combination  of  row  interchanges  employed  during  the  course 
of  the  computation. 


Exercises 


1.5.33.  Produce  the  LDV  or  a  permuted  LDV  factorization  of  the  following  matrices: 
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(a) 
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3 


2 

1 


(*>) 


(e) 


/  2  -3 

1  -1 
\1  -1 


2  \ 
1 
2y 


(i) 


0  4' 
7  2 

(2 

1 

2\ 

(l 
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). 

(c) 
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4 

-1 

•> 

0) 

1 

1 

/ 

-2 

L 

\2 

-1 

(1 

-1 

1 

2  \ 

( 1 

0 

2 

— 

1 

-4 

1 

5 

(g) 

2 

-2 

0 
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2 

-1 

-1 

1 

-2 

-2 
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V  3 
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1 

5  / 

VO 
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2  J 


1.5.34.  Using  the  LDV  factorization  for  the  matrices  you  found  in  parts  (a—g)  of  Exercise 
1.5.33,  solve  the  corresponding  linear  systems  Ax  =  b,  for  the  indicated  vector  b. 


(a) 


1 

2 


(*>) 


-1 

-2 


(c) 


/  1\ 

/-1\ 

(  -  1\ 

(  2\ 

( 

-3 

>  (d) 

4  ,  (e) 

-2 

.  (0 

-9 

3 

.  (g) 
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n 

V  2  / 

V-1  / 

U 

4  / 

w 

1-3  y 
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1.6  Transposes  and  Symmetric  Matrices 

Another  basic  operation  on  matrices  is  to  interchange  their  rows  and  columns.  If  A  is  an 
m  x  n  matrix,  then  its  transpose ,  denoted  by  AT,  is  the  n  x  m  matrix  whose  (i,  j)  entry 
equals  the  (j,  i)  entry  of  A;  thus 


B  =  A 


T 


means  that 


bn  =  aji 


For  example,  if 


A  = 


1  2 
4  5 


3 

6 


then 


,4t  = 


Observe  that  the  rows  of  A  become  the  columns  of  AT  and  vice  versa.  In  particular,  the 
transpose  of  a  row  vector  is  a  column  vector,  while  the  transpose  of  a  column  vector  is  a 

(A 

row  vector;  if  v  =  2  ,  then  vT  =  ( 1  2  3 ).  The  transpose  of  a  scalar,  considered  as  a 

w 

lxl  matrix,  is  itself:  cT  —  c. 

Remark.  Most  vectors  appearing  in  applied  mathematics  are  column  vectors.  To 
conserve  vertical  space  in  this  text,  we  will  often  use  the  transpose  notation,  e.g., 


v  =  ( v1,  v2,  v3  )T,  as  a  compact  way  of  writing  the  column  vector  v  = 


v- 


V‘ 


In  the  square  case,  transposition  can  be  viewed  as  “reflecting”  the  matrix  entries  across 
the  main  diagonal.  For  example, 


1 

3 

2 


2 

0 

4 


T 


In  particular,  the  transpose  of  a  lower  triangular  matrix  is  upper  triangular  and  vice-versa. 
Transposing  twice  returns  you  to  where  you  started: 


(Mt)t  -  A. 


(1.53) 


Unlike  inversion,  transposition  is  compatible  with  matrix  addition  and  scalar  multiplica¬ 
tion: 

(A  +  B)t  =  At  +  Bt,  ( cA)t  =  cAt .  (1.54) 

Transposition  is  also  compatible  with  matrix  multiplication,  but  with  a  twist.  Like  the 
inverse,  the  transpose  reverses  the  order  of  multiplication: 

(AB)t  =  BtAt.  (1.55) 

Indeed,  if  A  has  size  m  x  n  and  B  has  size  n  x  p,  so  they  can  be  multiplied,  then  AT  has 
size  n  x  m  and  BT  has  size  p  x  n,  and  so,  in  general,  one  has  no  choice  but  to  multiply 
BtAt  in  that  order.  Formula  (1.55)  is  a  straightforward  consequence  of  the  basic  laws  of 
matrix  multiplication.  More  generally, 

•••  Ak_1Ak)T  =  AlAl_1  •••  AlAl 
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An  important  special  case  is  the  product  of  a  row  vector  vT  and  a  column  vector  w  with 
the  same  number  of  entries.  In  this  case, 

vTw  =  (vTw)T  —  wTv,  (1.56) 

because  their  product  is  a  scalar  and  so,  as  noted  above,  equals  its  own  transpose. 

Lemma  1.32.  If  A  is  a  nonsingular  matrix,  so  is  AT,  and  its  inverse  is  denoted  by 

A-t  =  ( ATy 1  =  (A~1)t.  (1.57) 

Thus,  transposing  a  matrix  and  then  inverting  yields  the  same  result  as  first  inverting  and 
then  transposing. 

Proof :  Let  X  —  (A-1)T.  Then,  according  to  (1.55), 

XAt  =  (A~l)T  AT  =  (AA-1)t  =  IT  =  I. 

The  proof  that  AT X  —  I  is  similar,  and  so  we  conclude  that  X  =  ( AT)~ 1.  Q.E.D. 


Exercises 


1.6.1.  Write  down  the  transpose  of  the  following  matrices:  (a)  (  ^  ),  (b) 


1  1 
0  2 


(c) 


1  2 
2  1 


(d) 


12-1 
2  0  2 


1.6.2.  Let  A  = 


3 

1 


1 

2 


1 

1 


/-I  2 


71  2\ 

71  2  -1\ 

2  -3),  (f) 

3  4 

.  (g) 

0  3  2 

\  5  6  J 

U  1  5j 

B  = 


2  0  ] .  Compute  AT  and  BT .  Then  compute  ( AB)T 

■3  4 


m 

and  (BA)  without  first  computing  A 5  or  5 A. 

A  #  T  A  #  I  A  J  I 

1.6.3.  Show  that  (AB)  =  A  B  if  and  only  if  A  and  5  are  square  commuting  matrices. 
0  1.6.4.  Prove  formula  (1.55). 

A  1  7  A  f  7  A  f  7  A  f  7 

1.6.5.  Find  a  formula  for  the  transposed  product  (ABC)  in  terms  of  A  ,  L>  and  C1  . 

1.6.6.  True  or  false:  Every  square  matrix  A  commutes  with  its  transpose  AT . 


'T' 

0  1.6.7.  A  square  matrix  is  called  normal  if  it  commutes  with  its  transpose:  A  A  =  A  A  . 
Find  all  normal  2x2  matrices. 


T 


1.6.8.  (a)  Prove  that  the  inverse  transpose  operation  (1.57)  respects  matrix  multiplication: 
(AB)~t  =  A~tB~t.  (b)  Verify  this  identity  for  A  =  ^  j  jV  B  = 

m  rjn 

1.6.9.  Prove  that  if  A  is  an  invertible  matrix,  then  A  A  and  A  A  are  also  invertible. 

T1  T* 

1.6.10.  If  v,  w  are  column  vectors  with  the  same  number  of  entries,  does  vw  =  wv  ? 

77i  rri 

1.6.11.  Is  there  a  matrix  analogue  of  formula  (1.56),  namely  A  B  =  B  A? 

0  1.6.12.  (a)  Let  A  be  an  m  x  n  matrix.  Let  denote  the  1  x  n  column  vector  with  a  single  1 
in  the  entry,  as  in  (1.44).  Explain  why  the  product  Ae^  equals  the  column  of  A. 

(b)  Similarly,  let  be  the  1  x  m  column  vector  with  a  single  1  in  the  entry.  Explain 

771 

why  the  triple  product  Ae^  =  a-  equals  the  (i,j)  entry  of  the  matrix  A. 
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rp  m 

0  1.6.13.  Let  A  and  B  be  m  x  n  matrices,  (a)  Suppose  that  w  A  w  =  w  B  w  for  all  vectors 

r-p  rp 

v,  w.  Prove  that  A  =  B.  (b)  Give  an  example  of  two  matrices  such  that  v  Av  =  v  5v 
for  all  vectors  v,  but  A  V  B. 

0  1.6.14.  (a)  Explain  why  the  inverse  of  a  permutation  matrix  equals  its  transpose:  P~l  =  PT . 
(b)  If  A-1  =  At  ,  is  A  necessarily  a  permutation  matrix? 

0  1.6.15.  Let  A  be  a  square  matrix  and  P  a  permutation  matrix  of  the  same  size,  (a)  Explain 

rp 

why  the  product  AP  has  the  effect  of  applying  the  permutation  defined  by  P  to  the 

columns  of  A.  (b)  Explain  the  effect  of  multiplying  PAP  .  Hint :  Try  this  on  some  3x3 
examples  first. 


T  1.6.16.  Let  v,w  be  n  x  1  column  vectors,  (a)  Prove  that  in  most  cases  the  inverse  of  the  n  x  n 
matrix  A  =  I  —  vw  has  the  form  A  =  I  —  cvw  for  some  scalar  c.  Find  all  v,  w  for 


which  such  a  result  is  valid,  (b)  Illustrate  the  method  when  v 
(c)  What  happens  when  the  method  fails? 


and  w  = 


Factorization  of  Symmetric  Matrices 

A  particularly  important  class  of  square  matrices  consists  of  those  that  are  unchanged  by 
the  transpose  operation. 

Definition  1.33.  A  matrix  is  called  symmetric  if  it  equals  its  own  transpose:  A  =  AT. 

Thus,  A  is  symmetric  if  and  only  if  it  is  square  and  its  entries  satisfy  aJ?  =  atJ  for  all 

z,  j.  In  other  words,  entries  lying  in  “mirror  image”  positions  relative  to  the  main  diagonal 
must  be  equal.  For  example,  the  most  general  symmetric  3x3  matrix  has  the  form 

(a  b  c\ 
b  d  e  J  . 

C  e  // 

Note  that  all  diagonal  matrices,  including  the  identity,  are  symmetric.  A  lower  or  upper 
triangular  matrix  is  symmetric  if  and  only  if  it  is,  in  fact,  a  diagonal  matrix. 

The  LDV  factorization  of  a  nonsingular  matrix  takes  a  particularly  simple  form  if 
the  matrix  also  happens  to  be  symmetric.  This  result  will  form  the  foundation  of  some 
significant  later  developments. 

Theorem  1.34.  A  symmetric  matrix  A  is  regular  if  and  only  if  it  can  be  factored  as 

A  =  LDLT ,  (1.58) 

where  L  is  a  lower  unitriangular  matrix  and  D  is  a  diagonal  matrix  with  nonzero  diagonal 
entries. 

Proof :  We  already  know,  according  to  Theorem  1.29,  that  we  can  factor 

A  =  LDV.  (1.59) 

We  take  the  transpose  of  both  sides  of  this  equation: 

At  =  ( LDV)t  =  vtdtlt  =  vtdlt, 


(1.60) 
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since  diagonal  matrices  are  automatically  symmetric:  DT  —  D.  Note  that  VT  is  lower 
unitriangular,  and  LT  is  upper  unitriangular.  Therefore  (1.60)  is  the  LDV  factorization 
of  AT . 

In  particular,  if  A  is  symmetric,  then 

LDV  =  A  =  At  =  VtDLt. 


Uniqueness  of  the  LDV  factorization  implies  that 

L  =  Vt  and  V  =  LT 

(which  are  two  versions  of  the  same  equation).  Replacing  V  by  LT  in  (1.59)  establishes 
the  factorization  (1.58).  Q.E.D. 

Remark.  If  A  =  LDLt  ,  then  A  is  necessarily  symmetric.  Indeed, 

=  ( LDLt)t  =  ( Lt)tDtLt  —  L  D  Lt  —  A. 

However,  not  every  symmetric  matrix  has  an  LDLt  factorization.  A  simple  example  is 
the  irregular  but  nonsingular  2x2  matrix 


Example  1.35.  The  problem  is  to  find  the  LDLt  factorization  of  the  particular  sym- 

1  2  1\ 

metric  matrix  A  =  I  2  6  11.  This  requires  performing  the  usual  Gaussian  Elimination 

114/ 

algorithm.  Subtracting  twice  the  first  row  from  the  second  and  also  the  first  row  from  the 

(i  2  i\ 

third  produces  the  matrix  0  2  —11.  We  then  add  one  half  of  the  second  row  of  the 

\0  -1  3/ 

latter  matrix  to  its  third  row,  resulting  in  the  upper  triangular  form 


U  = 


=  DV, 


which  we  further  factor  by  dividing  each  row  of  U  by  its  pivot.  On  the  other  hand,  the  lower 

(\  0  0 

unitriangular  matrix  associated  with  the  preceding  row  operations  is  L  —  2  1  0 

\i  1 

which,  as  guaranteed  by  Theorem  1.34,  is  the  transpose  of  V  —  LT .  Therefore,  the  desired 
A  =  LU  =  LDLT  factorizations  of  this  particular  symmetric  matrix  are 


0  0 


Example  1.36.  Let  us  look  at  a  general  2x2  symmetric  matrix  A  = 


a  b 
b  c 

Regularity  requires  that  the  first  pivot  be  a  ^  0.  A  single  row  operation  will  place  A 

a  b 


in  upper  triangular  form  U  = 


0 


ac 


2  ,  and  so  A  is  regular  provided  ac  —  b2  ^  0 


a 
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also.  The  associated  lower  triangular  matrix  is  L  — 

0 


1  0 

b-  i 
a 


Thus,  A  =  LU,  as  you  can 


check.  Finally,  D  — 


a 


0 


ac 


_  ^2  is  just  the  diagonal  part  of  [/,  and  hence  U  —  DLT , 


a 


so  that  the  L  D  LT  factorization  is  explicitly  given  by 


a  b 
b  c 


1  0 

b-  i 

a 


a 


0 


0 


ac 


a 


i  b 

a 

0  1 


(1.61) 


Exercises 


1.6.17.  Find  all  values  of  a,  6,  and  c  for  which  the  following  matrices  are  symmetric: 


(a) 


3 

2a-  1 


a 

a  —  2 


\ 

1 

a 

2  \ 

/ 

>  (0 

-1 

b 

c 

.  (c) 

V  b 

3 

o  J  V 

3 

6 


a  +  2b  —  2  c  —4 

7  b  —  c 

4  b  +  3c 


1.6.18.  List  all  symmetric  (a)  3x3  permutation  matrices,  (b)  4x4  permutation  matrices. 

1.6.19.  True  or  false:  If  A  is  symmetric,  then  A2  is  symmetric. 

0  1.6.20.  True  or  false:  If  A  is  a  nonsingular  symmetric  matrix,  then  A-1  is  also  symmetric. 

0  1.6.21.  True  or  false:  If  A  and  B  are  symmetric  n  x  n  matrices,  so  is  AB. 

1.6.22.  (a)  Show  that  every  diagonal  matrix  is  symmetric,  (b)  Show  that  an  upper  (lower) 
triangular  matrix  is  symmetric  if  and  only  if  it  is  diagonal. 

1.6.23.  Let  A  be  a  symmetric  matrix,  (a)  Show  that  An  is  symmetric  for  every  nonnegative 
integer  n.  (b)  Show  that  2  A2  —  3  A  +  I  is  symmetric,  (c)  Show  that  every  matrix 
polynomial  p(A)  of  A,  cf.  Exercise  1.2.35,  is  a  symmetric  matrix. 

'T'  'T' 

1.6.24.  Show  that  if  A  is  any  matrix,  then  K  =  A  A  and  L  =  A  A  are  both  well-defined, 
symmetric  matrices. 

1.6.25.  Find  the  LDLT  factorization  of  the  following  symmetric  matrices: 

/  1 

1  1  \  ,,  x  (  -2  3s 


(a) 


1  4 


(b) 


3  -1 


(c) 


(d) 


1.6.26.  Find  the  LDLr  factorization  of  the  matrices 

(2  1  0 
12  1 


M2  = 


2  1 
1  2 


M3  = 


Vo 


0  1.6.27.  Prove  that  the  3x3  matrix  A  = 


1  2 

(l 

2 


and  M4  = 


/  2 
1 
0 


V 

1 

2 
1 


1 

0 

3 


-1 

2 

2 

0 


0  3  \ 
2  0 


-1 

0 


0 

1/ 


0  0\ 


1 

2 

1 


0 
1 
2  J 


2 

4 

1 


\o  o 

'T' 

cannot  be  factored  as  A  =  LDL  . 


'T' 

T  1.6.28.  Skew- symmetric  matrices:  An  n  x  n  matrix  J  is  called  skew- symmetric  if  J  =  —  J. 

(a)  Show  that  every  diagonal  entry  of  a  skew-symmetric  matrix  is  zero,  (b)  Write  down 
an  example  of  a  nonsingular  skew-symmetric  matrix,  (c)  Can  you  find  a  regular  skew- 

symmetric  matrix?  (d)  Show  that  if  J  is  a  nonsingular  skew-symmetric  matrix,  then  J-1  is 
also  skew-symmetric.  Verify  this  fact  for  the  matrix  you  wrote  down  in  part  (b).  (e)  Show 

'T' 

that  if  J  and  K  are  skew-symmetric,  then  so  are  J  ,  J  +  AT,  and  J  —  K.  What  about  J K? 

rp 

(f)  Prove  that  if  J  is  a  skew-symmetric  matrix,  then  v  J  v  =  0  for  all  vectors  v  £  Mn. 
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1.6.29.  (a)  Prove  that  every  square  matrix  can  be  expressed  as  the  sum,  A  =  S  +  J,  of  a 
symmetric  matrix  S  =  ST  and  a  skew-symmetric  matrix  J  =  —  JT . 


(b)  Write 


/  1  o  \ 

(l 

2 

3  \ 

O  A  and 

4 

5 

6 

V  3  4  J 

8 

9  / 

as  the  sum  of  symmetric  and  skew-symmetric  matrices. 


rp 

0  1.6.30.  Suppose  A  =  LU  is  a  regular  matrix.  Write  down  the  LU  factorization  of  A  .  Prove 

rp 

that  A  is  also  regular,  and  its  pivots  are  the  same  as  the  pivots  of  A. 


1.7  Practical  Linear  Algebra 


For  pedagogical  and  practical  reasons,  the  examples  and  exercises  we  have  chosen  to  illus¬ 
trate  the  algorithms  are  all  based  on  relatively  small  matrices.  When  dealing  with  matrices 
of  moderate  size,  the  differences  between  the  various  approaches  to  solving  linear  systems 
(Gauss,  Gauss-Jordan,  matrix  inverse,  and  so  on)  are  relatively  unimportant,  particularly 
if  one  has  a  decent  computer  or  even  hand  calculator  to  do  the  tedious  parts.  However, 
real-world  applied  mathematics  deals  with  much  larger  linear  systems,  and  the  design  of 
efficient  algorithms  is  a  must.  For  example,  numerical  solution  schemes  for  ordinary  differ¬ 
ential  equations  will  typically  lead  to  matrices  with  thousands  of  entries,  while  numerical 
schemes  for  partial  differential  equations  arising  in  fluid  and  solid  mechanics,  weather  pre¬ 
diction,  image  and  video  processing,  quantum  mechanics,  molecular  dynamics,  chemical 
processes,  etc.,  will  often  require  dealing  with  matrices  with  more  than  a  million  entries. 
It  is  not  hard  for  such  systems  to  tax  even  the  most  sophisticated  supercomputer.  Thus,  it 
is  essential  that  we  understand  the  computational  details  of  competing  methods  in  order 
to  compare  their  efficiency,  and  thereby  gain  some  experience  with  the  issues  underlying 
the  design  of  high  performance  numerical  algorithms. 

The  most  basic  question  is  this:  how  many  arithmetic  operations* *  —  in  numerical 
applications  these  are  almost  always  performed  in  floating  point  with  various  precision 
levels  —  are  required  to  complete  an  algorithm?  The  number  will  directly  influence  the 
time  spent  running  the  algorithm  on  a  computer.  We  shall  keep  track  of  additions  and 
multiplications  separately,  since  the  latter  typically  take  longer  to  process.*  But  we  shall 
not  distinguish  between  addition  and  subtraction,  nor  between  multiplication  and  division, 
since  these  typically  have  the  same  complexity.  We  shall  also  assume  that  the  matrices 
and  vectors  we  deal  with  are  generic ,  with  few,  if  any,  zero  entries.  Modifications  of  the 
basic  algorithms  for  sparse  matrices ,  meaning  those  that  have  lots  of  zero  entries,  are  an 
important  topic  of  research,  since  these  include  many  of  the  large  matrices  that  appear 
in  applications  to  differential  equations.  We  refer  the  interested  reader  to  more  advanced 
treatments  of  numerical  linear  algebra,  such  as  [21,  40,  66,  89],  for  such  developments. 

First,  when  multiplying  an  n  x  n  matrix  A  and  an  n  x  1  column  vector  b,  each  entry 
of  the  product  Ah  requires  n  multiplications  of  the  form  a  -  b-  and  n—  1  additions  to  sum 
the  resulting  products.  Since  there  are  n  entries,  this  means  a  total  of  n2  multiplications 


*  For  simplicity,  we  will  count  only  the  basic  arithmetic  operations.  But  it  is  worth  noting 
that  other  issues,  such  as  the  number  of  storage  and  retrieval  operations,  may  also  play  a  role  in 
estimating  the  computational  complexity  of  a  numerical  algorithm. 

*  At  least,  in  traditional  computer  architectures.  New  algorithms  and  new  methods  for  per¬ 
forming  basic  arithmetic  operations  on  a  computer,  particularly  in  high  precision  arithmetic,  make 
this  discussion  trickier.  For  simplicity,  we  will  stay  with  the  “classical”  version  here. 
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and  n{n  —  1)  —  n2  —  n  additions.  Thus,  for  a  matrix  of  size  n  —  100,  one  needs  about 
10,000  distinct  multiplications  and  a  similar  number  of  additions.  If  n  —  1,000,000  =  106, 
then  n2  =  1012,  which  is  phenomenally  large,  and  the  total  time  required  to  perform  the 
computation  becomes  a  significant  issued 

Let  us  next  look  at  the  (regular)  Gaussian  Elimination  algorithm,  referring  back  to 
our  pseudocode  program  for  the  notational  details.  First,  we  count  how  many  arithmetic 
operations  are  based  on  the  jth  pivot  rn-.  For  each  of  the  n  —  j  rows  lying  below  it,  we 
must  perform  one  division  to  compute  the  factor  l-  —  mtJ  / ra  ■  ■  used  in  the  elementary 
row  operation.  The  entries  in  the  column  below  the  pivot  will  be  set  to  zero  automatically, 
and  so  we  need  only  compute  the  updated  entries  lying  strictly  below  and  to  the  right  of 
the  pivot.  There  are  (n  —  j)2  such  entries  in  the  coefficient  matrix  and  an  additional  n  —  j 
entries  in  the  last  column  of  the  augmented  matrix.  Let  us  concentrate  on  the  former  for 
the  moment.  For  each  of  these,  we  replace  mik  by  mik  —  l-  m-k ,  and  so  must  perform  one 
multiplication  and  one  addition.  For  the  jth  pivot,  there  is  a  total  of  (n  —  j){n  —  j  +  1) 
multiplications  —  including  the  initial  n  —  j  divisions  needed  to  produce  the  l7J  —  and 
(n  —  j)2  additions  needed  to  update  the  coefficient  matrix.  Therefore,  to  reduce  a  regular 
n  x  n  matrix  to  upper  triangular  form  requires  a  total^  of 

n  q 

E,  \  /  \  U-  —  Tl 

(n  —  j)(n  —  j  +  1)  =  — - —  multiplications,  and 


3  =  1 

/  .x 2  2n3  —  3 n2  +  n  ,  . 

y  [n  —  j)  =  - - -  additions. 


(1.62) 


Thus,  when  n  is  large,  both  involve  approximately  |  n3  operations. 

We  should  also  be  keeping  track  of  the  number  of  operations  on  the  right-hand  side  of 
the  system.  No  pivots  appear  there,  and  so  there  are 

n  o 

E.  \  Tl  —  Tl  ,  v 

(n-j)  =  — ^ —  (1'63) 

3  =  1 

multiplications  and  the  same  number  of  additions  required  to  produce  the  right-hand  side 
in  the  resulting  triangular  system  C7 x  =  c.  For  large  n,  this  count  is  considerably  smaller 
than  the  coefficient  matrix  totals  (1.62).  We  note  that  the  Forward  Substitution  equations 
(1.26)  require  precisely  the  same  number  of  arithmetic  operations  to  solve  Lc  —  b  for  the 

right-hand  side  of  the  upper  triangular  system.  Indeed,  the  jth  equation 

j- 1 

cj  ~  bj  ~  Ijk  ck 

k  =  1 


requires  j  —  1  multiplications  and  the  same  number  of  additions,  giving  a  total  of 


n 


!)  = 

3  =  1 


n  —  n 


operations  of  each  type.  Therefore,  to  reduce  a  linear  system  to  upper  triangular  form, 
it  makes  no  difference  in  computational  efficiency  whether  one  works  directly  with  the 


^  See  Exercise  1.7.8  for  more  sophisticated  computational  algorithms  that  can  be  employed  to 
(slightly)  speed  up  multiplication  of  large  matrices. 

In  Exercise  1.7.4,  the  reader  is  asked  to  prove  these  summation  formulaes  by  induction. 
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augmented  matrix  or  employs  Forward  Substitution  after  the  LU  factorization  of  the  co¬ 
efficient  matrix  has  been  established. 


The  Back  Substitution  phase  of  the  algorithm  can  be  similarly  analyzed.  To  find  the 
value  of 

1 


n 


E 

k=j  + 1 


ujk  Xk 


once  we  have  computed  #  -+1, . . . ,  xn,  requires  n  —  j  +  1  multiplications/divisions  and  n  —  j 
additions.  Therefore,  Back  Substitution  requires 


n 


Y  (n  -  j  +  1)  = 

3  =  1 


nz  +  n 


n 


Y(n~  •?) = 


n  —  n 


3  =  1 


multiplications,  along  with 


additions. 


(1.64) 


For  n  large,  both  of  these  are  approximately  equal  to  \v? .  Comparing  the  counts,  we 
conclude  that  the  bulk  of  the  computational  effort  goes  into  the  reduction  of  the  coefficient 
matrix  to  upper  triangular  form. 


Combining  the  two  counts  (1.63-64),  we  discover  that,  once  we  have  computed  the 
A  =  LU  decomposition  of  the  coefficient  matrix,  the  Forward  and  Back  Substitution 
process  requires  n2  multiplications  and  n2  —  n  additions  to  solve  a  linear  system  4x  =  b. 
This  is  exactly  the  same  as  the  number  of  multiplications  and  additions  needed  to  compute 
the  product  A-1  b.  Thus,  even  if  we  happen  to  know  the  inverse  of  A,  it  is  still  just  as 
efficient  to  use  Forward  and  Back  Substitution  to  compute  the  solution! 

On  the  other  hand,  the  computation  of  A-1  is  decidedly  more  inefficient.  There  are  two 
possible  strategies.  First,  we  can  solve  the  n  linear  systems  (1.45),  namely 


Ax  =  e-,  z  =  l,...,n,  (1.65) 

for  the  individual  columns  of  A-1.  This  requires  first  computing  the  LU  decomposition, 
which  uses  about  ^  n3  multiplications  and  a  similar  number  of  additions,  followed  by  apply¬ 
ing  Forward  and  Back  Substitution  to  each  of  the  systems,  using  n-n 2  =  n3  multiplications 
and  n  ( n 2  —  n)  n3  additions,  for  a  grand  total  of  about  |  n3  operations  of  each  type  in 
order  to  compute  A-1.  Gauss-Jordan  Elimination  fares  no  better  (in  fact,  slightly  worse), 
also  requiring  about  the  same  number,  |  n3,  of  each  type  of  arithmetic  operation.  Both 
algorithms  can  be  made  more  efficient  by  exploiting  the  fact  that  there  are  lots  of  zeros 
on  the  right-hand  sides  of  the  systems  (1.65).  Designing  the  algorithm  to  avoid  adding 
or  subtracting  a  preordained  0,  or  multiplying  or  dividing  by  a  preordained  ±1,  reduces 
the  total  number  of  operations  required  to  compute  A-1  to  exactly  n3  multiplications  and 
n(n—  l)2  ~  n3  additions.  (Details  are  relegated  to  the  exercises.)  And  don’t  forget  that  we 
still  need  to  multiply  A-1b  to  solve  the  original  system.  As  a  result,  solving  a  linear  system 
with  the  inverse  matrix  requires  approximately  three  times  as  many  arithmetic  operations, 
and  so  would  take  three  times  as  long  to  complete,  as  the  more  elementary  Gaussian  Elim¬ 
ination  and  Back  Substitution  algorithm.  This  justifies  our  earlier  contention  that  matrix 
inversion  is  inefficient,  and,  except  in  very  special  situations,  should  never  be  used  for 
solving  linear  systems  in  practice. 
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Exercises 

1.7.1.  Solve  the  following  linear  systems  by  (z)  Gaussian  Elimination  with  Back  Substitution; 
(zz)  the  Gauss-Jordan  algorithm  to  convert  the  augmented  matrix  to  the  fully  reduced 

form  (  I  |  x)  with  solution  x;  (zzz)  computing  the  inverse  of  the  coefficient  matrix, 
and  then  multiplying  it  by  the  right-hand  side.  Keep  track  of  the  number  of  arithmetic 
operations  you  need  to  perform  to  complete  each  computation,  and  discuss  their  relative 
efficiency. 

2x  —  Ay  +  oz  =  6,  x  —  3y  =1, 

x  —  2y  =  4 

(a)  (b)  3x  —  3y  +  4z  =  —  1,  (c)  3x  —  7y  +  5z  =  —  1, 

3x  +  y  =  —7, 

—  4x  +  3y  —  Az  =  5,  —  2x  +  (ry  —  3z  =  0. 

O  1 

1.7.2.  (a)  Let  A  be  an  n  x  n  matrix.  Which  is  faster  to  compute,  A  or  A~  ?  Justify 

Q 

your  answer,  (b)  What  about  A  versus  A~  ?  (c)  How  many  operations  are  needed 

to  compute  Ak?  Hint :  When  k  >  3,  you  can  get  away  with  less  than  k  —  1  matrix 
multiplications! 

1.7.3.  Which  is  faster:  Back  Substitution  or  multiplying  a  matrix  by  a  vector?  How  much  faster? 

0  1.7.4.  Use  induction  to  prove  the  summation  formulas  (1.62),  (1.63)  and  (1.64). 

G  1.7.5.  Let  A  be  a  general  n  x  n  matrix.  Determine  the  exact  number  of  arithmetic  operations 
needed  to  compute  A-1  using  (a)  Gaussian  Elimination  to  factor  P A  =  LU  and  then 

Forward  and  Back  Substitution  to  solve  the  n  linear  systems  (1.65);  (b)  the  Gauss- 
Jordan  method.  Make  sure  your  totals  do  not  count  adding  or  subtracting  a  known  0,  or 
multiplying  or  dividing  by  a  known  ±1. 

1.7.6.  Count  the  number  of  arithmetic  operations  needed  to  solve  a  system  the  “old-fashioned” 
way,  by  using  elementary  row  operations  of  all  three  types,  in  the  same  order  as  the  Gauss- 
Jordan  scheme,  to  fully  reduce  the  augmented  matrix  M  =  (  A  |  b  )  to  the  form  (  I  |  d  ) , 
with  x  =  d  being  the  solution. 

1.7.7.  An  alternative  solution  strategy,  also  called  Gauss- Jordan  in  some  texts,  is,  once  a  pivot 
is  in  position,  to  use  elementary  row  operations  of  type  #1  to  eliminate  all  entries  both 
above  and  below  it,  thereby  reducing  the  augmented  matrix  to  diagonal  form  (  D 


where  D  =  diag  (d1? . . .  ,  dn)  is  a  diagonal  matrix  containing  the  pivots.  The  solutions 
xi  =  ci/^i  are  then  obtained  by  simple  division.  Is  this  strategy  more  efficient,  less  efficient, 
or  the  same  as  Gaussian  Elimination  with  Back  Substitution?  Justify  your  answer  with  an 
exact  operations  count. 


G  1.7.8.  Here,  we  describe  a  remarkable  algorithm  for  matrix  multiplication  discovered  by 

Strassen,  [82] .  Let  A  =  ^  ^  ,  B  =  ^  ^  ^  ,  and  C  =  (^j1  = 

be  block  matrices  of  size  n  =  2m,  where  all  blocks  are  of  size  m  x  m.  (a)  Let  D1  = 

(Ajl  +  A4)(H1  +  B4),  D2  =  (A4  —  AS)(B1  +  B2),  Ds  =  (A2  —  A4)(H3  +  B4), 

7J4  =  ( A4  +  A2)  b>4,  D3  =  (A. 3  +  A_4 )  B ^ ,  Dq  =  A4(i?1  —  B3),  Dj  =  A^  (b>2  —  L>4).  Show 
that  C i  =  D i  T  D3  —  ZJ4  —  C2  =  D4  -f-  C3  =  D3  —  ^4  =  —  D2  —  D3  T  Dj 

(b)  How  many  arithmetic  operations  are  required  when  A  and  B  are  2x2  matrices?  How 
does  this  compare  with  the  usual  method  of  multiplying  2x2  matrices? 

(c)  In  the  general  case,  suppose  we  use  standard  matrix  multiplication  for  the  matrix 

products  in  D1: . . . ,  D7.  Prove  that  Strassen’s  Method  is  faster  than  the  direct  algorithm 
for  computing  AH  by  a  factor  of  ~  (d)  When  A  and  B  have  size  n  x  n  with  n  =  2r, 

we  can  recursively  apply  Strassen’s  Method  to  multiply  the  2r-1  x  2?_1  blocks  Ai^Bi. 
Prove  that  the  resulting  algorithm  requires  a  total  of  7?  =  nlog2  7  =  n2-80735  multiplications 
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and  6(7?  1  —  4r  1)  <  7?  =  nlog2 7  additions/subtractions,  versus  n3  multiplications  and 
n3  —  n2  ~  n3  additions  for  the  ordinary  matrix  multiplication  algorithm.  How  much  faster 
is  Strassen’s  Method  when  n  =  210?  225?  2100?  (e)  How  might  you  proceed  if  the  size  of 
the  matrices  does  not  happen  to  be  a  power  of  2?  Further  developments  of  these  ideas  can 
be  found  in  [11,  40]. 


Tridiagonal  Matrices 


Of  course,  in  special  cases,  the  actual  arithmetic  operation  count  might  be  considerably 
reduced,  particularly  if  A  is  a  sparse  matrix  with  many  zero  entries.  A  number  of  specialized 
techniques  have  been  designed  to  handle  sparse  linear  systems.  A  particularly  important 
class  consists  of  the  tridiagonal  matrices 


/<h  ri 
Pi  Q2 


\ 


Pn-2  Qn-1  Tn- 1 
Pn—1  din 


(1.66) 


with  all  entries  zero  except  for  those  on  the  main  diagonal,  namely  ai{  —  qil  the  subdi¬ 
agonal ,  meaning  the  n  —  1  entries  ai+1  i  =  pi  immediately  below  the  main  diagonal,  and 
the  superdiagonal ,  meaning  the  entries  a— +1  =  ri  immediately  above  the  main  diagonal. 
(Blanks  are  used  to  indicate  0  entries.)  Such  matrices  arise  in  the  numerical  solution  of 
ordinary  differential  equations  and  the  spline  fitting  of  curves  for  interpolation  and  com¬ 
puter  graphics.  If  A  —  LU  is  regular,  it  turns  out  that  the  factors  are  lower  and  upper 
bidiagonal  matrices ,  of  the  form 


(l 

\ 

/  d1  u1 

\ 

k  1 

d2  ^2 

k 

1 

d3 

u3 

•  (1.67) 

L  = 

■ 

,  u  = 

In- 2  1 

^n— 1  ^n— 1 

ln-1 

\ 

dj 

Multiplying  out  L  U  and  equating  the  result  to  A  leads  to  the  equations 


d1  =  3h, 
l\  ui  +  d2  =  q2. 


ui  =  rn 

=  r2, 


Zi  d ^  jq , 
^2  d2  —  P2, 


uj 


+  dj  —  q^ , 


3 


U3=r 


3 


l j  dj  Pj, 


3 


(1.68) 


^n—2  ^n  —  2  A  — 1  dri  —  1  ’  ^n  —  1  ^‘n  —  1  ‘  ^n  —  1  ^n  —  1  Pn—  1? 

ln-iun-i  +  dn  =  qn . 

These  elementary  algebraic  equations  can  be  successively  solved  for  the  entries  of  L  and  U 
in  the  following  order:  d1:  u1:  Z1?  d2,u2,l2,  d3lu3  ...  .  The  original  matrix  A  is  regular  pro¬ 
vided  none  of  the  diagonal  entries  d1:d2: . . .  are  zero,  which  allows  the  recursive  procedure 
to  successfully  proceed  to  termination. 
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Once  the  LU  factors  are  in  place,  we  can  apply  Forward  and  Back  Substitution  to  solve 
the  tridiagonal  linear  system  ix  =  b.  We  first  solve  the  lower  triangular  system  Lc  —  b 
by  Forward  Substitution,  which  leads  to  the  recursive  equations 


c 


1 


h  ci5 


Cn  ln  —  1  Cn  —  1*  (1.69) 


We  then  solve  the  upper  triangular  system  U x  =  c  by  Back  Substitution,  again  recursively: 


x 


n 


X  n  —  1 


Cn- 1  Un- 1 Xn 


dn- 1 


Ci  U- y 

d1 


(1.70) 


As  you  can  check,  there  are  a  total  of  5n  —  4  multiplications/divisions  and  3n  —  3  addi¬ 
tions/subtractions  required  to  solve  a  general  tridiagonal  system  of  n  linear  equations 
a  striking  improvement  over  the  general  case. 


Example  1.37.  Consider  the  n  x  n  tridiagonal  matrix 


/ 


4  1 
1  4 
1 


1 

4  1 

1  4  1 


\ 


\ 


1  4  1 

1  4/ 


in  which  the  diagonal  entries  are  all  qi  =  4,  while  the  entries  immediately  above  and  below 
the  main  diagonal  are  all  pi  —  ri  —  1.  According  to  (1.68),  the  tridiagonal  factorization 
(1.67)  has  ux—u2—  •  •  •  =  un_1  —  1,  while 


dx  =  4,  lj  =  l/dj,  dj+1  =4-lj,  j  =  1, 2, . . . ,  n  -  1. 


The  computed  values  are 


3 

1 

2 

3 

4 

5 

6 

7 

dj 

4.0 

3.75 

3.733333 

3.732143 

3.732057 

3.732051 

3.732051 

h 

.25 

.266666 

.267857 

.267942 

.267948 

.267949 

.267949 

These  converge  rapidly  to 

d3  — >  2  +  V3  =  3.732050  . . .  ,  l0  — >  2  -  V3  =  .267949 

which  makes  the  factorization  for  large  n  almost  trivial.  The  numbers  2  ±  y/3  are  the  roots 
of  the  quadratic  equation  x2  —  4x  +  1  =  0,  and  are  characterized  as  the  fixed  points  of  the 
nonlinear  iterative  system  d-+1  =  4  —  1/d-. 
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Exercises 

1.7.9.  For  each  of  the  following  tridiagonal  systems  find  the  LU  factorization  of  the  coefficient 


(  1  2  °V 

f  4\ 

matrix,  and  then  solve  the  system,  (a) 

-l  -l  i 

x  = 

-1 

V  0  -2  3 ) 

{-6j 

/  1  — 1  0  0\  /1\ 

7  1  2 

0 

0\ 

(*>) 


V 


1 

o 

o 


2 

1 

0 


1 

4 

5 


0 
1 
6/ 


x 


0 
6 

\7J 


(c) 


V 


1 

o 

o 


3 

1 

0 


0 

4 

1 


0 
1 
iy 


X  = 


( 


V 


0\ 
2 
3 
1 ) 


1.7.10.  True  or  false:  (a)  The  product  of  two  tridiagonal  matrices  is  tridiagonal. 

(b)  The  inverse  of  a  tridiagonal  matrix  is  tridiagonal. 

1.7.11.  (  a)  Find  the  LU  factorization  of  the  n  x  n  tridiagonal  matrix  An  with  all  2’s  along  the 

diagonal  and  all  — l’s  along  the  sub-  and  super-diagonals  for  n  =  3,4,  and  5.  (b)  Use 

m 

your  factorizations  to  solve  the  system  Anx  =  b,  where  b  =  (1, 1,1,...,  1)  .  (c)  Can 

you  write  down  the  LU  factorization  of  An  for  general  n?  Do  the  entries  in  the  factors 
approach  a  limit  as  n  gets  larger  and  larger?  (d)  Can  you  find  the  solution  to  the  system 
Anx  =  b  =  (1, 1, 1, . . . ,  1)T  for  general  n? 

4b  1.7.12.  Answer  Exercise  1.7.11  if  the  super-diagonal  entries  of  An  are  changed  to  +1. 

/  4 


4b  1.7.13.  Find  the  LU  factorizations  of 


/  4 
1 

Vi 


1 

4 

1 


1\ 

1 

4  ) 


f  4 
1 
0 
VI 


1 

4 

1 

0 


o  1\ 
1  0 
4  1 
1  4/ 


1  0  0  1\ 


1  4  1 

0  14 

0  0  1 
VI  0  0 


0  0 


1 

4 

1 


0 
1 
4/ 


Do  you  see  a  pattern?  Try  the  6x6  version.  The  following  exercise  should  now  be  clear. 

Pi  \ 


(<h 
P2 


T  1.7.14.  A  tricirculant  matrix  C  = 


r 


l 

42 

Ps 


r2 

% 


r< 


V  r 


P 


n 


4n  — 1  Ui  —  1 
Pn  Qn 

for  its  (l,n)  and  (n,  1)  entries.  Tricirculant  matrices  arise  in  the  numerical  solution  of 
periodic  boundary  value  problems  and  in  spline  interpolation. 

(a)  Prove  that  if  C  =  LU  is  regular,  its  factors  have  the  form 


is  tridiagonal  except 


n 


(  1 
l 


\ 


1 


V  rn 


rrr 


L  o  1 

L 


'n  —  2 

m 


(  d1 


1 


u 

dr 


Uc 

d. 


Ur 


V r 

V‘ 


\ 


d 


n — 2 


U 

d 


V 


n  —  2 
n  —  1 


V 

U 


Jn  —  2  1  n  —  1 

(b)  Compute  the  LU  factorization  of  the  n  x  n  tricirculant  matrix 

(  1  -1  -1\ 


n  —  l 
n  —  1 
dn  > 


^n  = 


-1  2  -1 

-1  3  -1 


V 


-1  n  —  1  -1 

-1  1 J 


for  n  =  3,  5,  and  6.  What  goes  wrong  when  n  =  4? 
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C  1.7.15.  A  matrix  A  is  said  to  have  bandwidth  k  if  all  entries  that  are  more  than  k  slots  away 

from  the  main  diagonal  are  zero:  a-  =  0  whenever  |  i  —  j  |  >  k.  (a)  Show  that  a  tridiagonal 

matrix  has  band  width  1.  (b)  Write  down  an  example  of  a  6  x  6  matrix  of  band  width  2 
and  one  of  band  width  3.  (c)  Prove  that  the  L  and  U  factors  of  a  regular  banded  matrix 
have  the  same  band  width,  (d)  Find  the  LU  factorization  of  the  matrices  you  wrote  down 
in  part  (b).  (e)  Use  the  factorization  to  solve  the  system  Ax  =  b,  where  b  is  the  column 
vector  with  all  entries  equal  to  1.  (f)  How  many  arithmetic  operations  are  needed  to  solve 
Ax  =  b  if  A  is  banded?  (g)  Prove  or  give  a  counterexample:  the  inverse  of  a  banded 
matrix  is  banded. 


Pivoting  Strategies 

Let  us  now  investigate  the  practical  side  of  pivoting.  As  we  know,  in  the  irregular  situations 
when  a  zero  shows  up  in  a  diagonal  pivot  position,  a  row  interchange  is  required  to  proceed 
with  the  elimination  algorithm.  But  even  when  a  nonzero  pivot  element  is  in  place,  there 
may  be  good  numerical  reasons  for  exchanging  rows  in  order  to  install  a  more  desirable 
element  in  the  pivot  position.  Here  is  a  simple  example: 

.01  x  + 1.6  y  =  32.1,  x  +  .0y  =  22.  (1.71) 

The  exact  solution  to  the  system  is  easily  found: 

x  =  10,  y  =  20. 


Suppose  we  are  working  with  a  very  primitive  calculator  that  only  retains  3  digits  of 
accuracy.  (Of  course,  this  is  not  a  very  realistic  situation,  but  the  example  could  be 
suitably  modified  to  produce  similar  difficulties  no  matter  how  many  digits  of  accuracy  our 
computer  is  capable  of  retaining.)  The  augmented  matrix  is 


(.01  1.6  32. A 

\  1  .6  22 )  ' 


Choosing  the  (1,1)  entry  as  our  pivot,  and  subtracting  100  times  the  first  row  from  the 
second  produces  the  upper  triangular  form 


(.01  1.6  32.1  \ 

V  0  -159.4  -3188  J  ' 


Since  our  calculator  has  only  three-place  accuracy,  it  will  round  the  entries  in  the  second 
row,  producing  the  augmented  coefficient  matrix 


(.01  1.6  32.1  \ 

V  0  -159.0  -3190  J  ' 


The  solution  by  Back  Substitution  gives 

y  —  3190/159  =  20.0628  .. .  cs  20.1,  and  then 

x  =  100  (32.1  -  1.6  y)  =  100  (32.1  -  32.16)  ~  100  (32.1  -  32.2)  =  - 10. 


The  relatively  small  error  in  y  has  produced  a  very  large  error  in  x  —  not  even  its  sign  is 
correct! 

The  problem  is  that  the  first  pivot,  .01,  is  much  smaller  than  the  other  element,  1,  that 
appears  in  the  column  below  it.  Interchanging  the  two  rows  before  performing  the  row 
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Gaussian  Elimination  With  Partial  Pivoting 


start 

for  i  —  1  to  n 

set  r(i)  =  i 

next  i 

for  j  =  1  to  n 

if  mr^  ■  =  0  for  all  i>  j ,  stop;  print  “A  is  singular” 
choose  i  >  j  such  that  mr^  ■  is  maximal 
interchange  r(i)  < — >  r(j ) 
for  i = j + 1  to  n 

Set  ^r(i)j  =  mr(i)j/mr(j)j 

for  k  =  j  +  1  to  n+1 

Set  ^ r(i)k  ^r(i)j^r(j)k 

next  k 

next  i 
next  j 

end 


operation  would  resolve  the  difficulty 
the  interchange,  we  have 


even  with  such  an  inaccurate  calculator!  After 


1  .6 
.01  1.6 


22 

32.1 


which  results  in  the  rounded-off  upper  triangular  form 


1  .6 

0  1.594 


22 

31.88 


1  .6 

0  1.59 


22 

31.9 


The  solution  by  Back  Substitution  now  gives  a  respectable  answer: 


y  =  31.9/1.59  =  20.0628...  ~  20.1,  x  =  22  -  .6  y  =  22  -  12.06  22-12.1  =  9.9. 


The  general  strategy,  known  as  Partial  Pivoting ,  says  that  at  each  stage,  we  should 
use  the  largest  (in  absolute  value)  legitimate  (i.e.,  in  the  pivot  column  on  or  below  the 
diagonal)  element  as  the  pivot,  even  if  the  diagonal  element  is  nonzero.  Partial  Pivoting 
can  help  suppress  the  undesirable  effects  of  round-off  errors  during  the  computation. 

In  a  computer  implementation  of  pivoting,  there  is  no  need  to  waste  processor  time 
physically  exchanging  the  row  entries  in  memory.  Rather,  one  introduces  a  separate  array 
of  pointers  that  serve  to  indicate  which  original  row  is  currently  in  which  permuted  position. 
More  concretely,  one  initializes  n  row  pointers  r(  1)  =  1  ,...,r(n)  =  n.  Interchanging 
row  i  and  row  j  of  the  coefficient  or  augmented  matrix  is  then  accomplished  by  merely 
interchanging  r(i)  and  r(j).  Thus,  to  access  a  matrix  element  that  is  currently  in  row  i  of 
the  augmented  matrix,  one  merely  retrieves  the  element  that  is  in  row  r(i)  in  the  computer’s 
memory.  An  explicit  implementation  of  this  strategy  is  provided  in  the  accompanying 
pseudocode  program. 
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Partial  pivoting  will  solve  most  problems,  although  there  can  still  be  difficulties.  For 
instance,  it  does  not  accurately  solve  the  system 


10  x  +  1600  y  —  32100,  x  +  .6  y  —  22, 


obtained  by  multiplying  the  first  equation  in  (1.71)  by  1000.  The  tip-off  is  that,  while  the 
entries  in  the  column  containing  the  pivot  are  smaller,  those  in  its  row  are  much  larger.  The 
solution  to  this  difficulty  is  Full  Pivoting ,  in  which  one  also  performs  column  interchanges 
preferably  with  a  column  pointer  —  to  move  the  largest  legitimate  element  into  the 
pivot  position.  In  practice,  a  column  interchange  amounts  to  reordering  the  variables  in 
the  system,  which,  as  long  as  one  keeps  proper  track  of  the  order,  also  doesn’t  change  the 
solutions.  Thus,  switching  the  order  of  x,  y  leads  to  the  augmented  matrix 


f 1600  10 

V  .6  1 


32100  \ 
22  )  ’ 


in  which  the  first  column  now  refers  to  y  and  the  second  to  x.  Now  Gaussian  Elimination 
will  produce  a  reasonably  accurate  solution  to  the  system. 

Finally,  there  are  some  matrices  that  are  hard  to  handle  even  with  sophisticated  pivoting 
strategies.  Such  ill-conditioned  matrices  are  typically  characterized  by  being  “almost” 
singular.  A  famous  example  of  an  ill-conditioned  matrix  is  the  n  x  n  Hilbert  matrix 


( 


1 

1 

2 

1 

3 
1 

4 


1 

2 

1 

3 
1 

4 
1 

5 


1 

3 
1 

4 
1 

5 
1 

6 


1 

4 
1 

5 
1 

6 
1 
7 


-  \ 

n 

1 

n  +  1 
1 

n  +  2 

1 

n  +  3 


(1.72) 


1  1 

\  n  n  T  1 


1  1 

n  +  2  n  +  3 


1 

2n  —  1 


/ 


Later,  in  Proposition  3.40,  we  will  prove  that  Hn  is  nonsingular  for  all  n.  However,  the  solu¬ 
tion  of  a  linear  system  whose  coefficient  matrix  is  a  Hilbert  matrix  Hn:  even  for  moderately 
large  n,  is  a  very  challenging  problem,  even  using  high  precision  computer  arithmetic^.  This 
is  because  the  larger  n  is,  the  closer  Hn  is,  in  a  sense,  to  being  singular.  A  full  discussion 
of  the  so-called  condition  number  of  a  matrix  can  be  found  in  Section  8.7. 

The  reader  is  urged  to  try  the  following  computer  experiment.  Fix  a  moderately  large 
value  of  n ,  say  20.  Choose  a  column  vector  x  with  n  entries  chosen  at  random.  Compute 
b  =  Hn  x  directly.  Then  try  to  solve  the  system  Hn  x  =  b  by  Gaussian  Elimination,  and 
compare  the  result  with  the  original  vector  x.  If  you  obtain  an  accurate  solution  with 
n  —  20,  try  n  —  50  or  100.  This  will  give  you  a  good  indicator  of  the  degree  of  arithmetic 
precision  used  by  your  computer  hardware,  and  the  accuracy  of  the  numerical  solution 
algorithm (s)  in  your  software. 


^  In  computer  algebra  systems  such  as  Maple  and  Mathematica,  one  can  use  exact  rational 
arithmetic  to  perform  the  computations.  Then  the  important  issues  are  time  and  computational 
efficiency.  Incidentally,  there  is  an  explicit  formula  for  the  inverse  of  a  Hilbert  matrix,  which 
appears  in  Exercise  1.7.23. 
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Exercises 

1.7.16.  (a)  Find  the  exact  solution  to  the  linear  system 


(b)  Solve 


the  system  using  Gaussian  Elimination  with  2-digit  rounding,  (c)  Solve  the  system  using 
Partial  Pivoting  and  2-digit  rounding,  (d)  Compare  your  answers  and  discuss. 

1.7.17.  (a)  Find  the  exact  solution  to  the  linear  system  x  —  5y  —  z  =  1,  ^x  —  ^y  -\-  z  =  0, 

2x  —  y  =  3.  (b)  Solve  the  system  using  Gaussian  Elimination  with  4-digit  rounding. 

(c)  Solve  the  system  using  Partial  Pivoting  and  4-digit  rounding.  Compare  your  answers. 

1.7.18.  Answer  Exercise  1.7.17  for  the  system 

x  +  Ay  —  3z  =  —3,  25x  +  97  y  —  35  z  =  39,  35x  —  22  y  +  332;  =  —15. 

1.7.19.  Employ  2  digit  arithmetic  with  rounding  to  compute  an  approximate  solution  of  the 
linear  system  0.2 x  -\-  2y  —  3z  =  6,  5x  +  A3y  27 z  =  58,  3x  +  23y  —  42 z  =  —87, 
using  the  following  methods:  (a)  Regular  Gaussian  Elimination  with  Back  Substitution; 
(b)  Gaussian  Elimination  with  Partial  Pivoting;  (c)  Gaussian  Elimination  with  Full 
Pivoting,  (d)  Compare  your  answers  and  discuss  their  accuracy. 

1.7.20.  Solve  the  following  systems  by  hand,  using  pointers  instead  of  physically  interchanging 


the  rows:  (a) 
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V3 
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( b ) 
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1.7.21.  Solve  the  following  systems  using  Partial  Pivoting  and  pointers: 
(a) 


(c) 


/ 
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1 

2 

1 
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(1 

2 

(  x ^ 

Pi 

( b ) 

4 

-2 

i 

y 

— 

3 

^3 

5 

-i/ 
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w 

/  1\ 
-2 
0 

V  1/ 


(d) 


/.  01  4 

2  -802 
v  7  .03 


1.7.22.  Use  Full  Pivoting  with  pointers  to  solve  the  systems  in  Exercise  1.7.21. 

4b  1.7.23.  Let  Hn  be  the  n  x  n  Hilbert  matrix  (1.72),  and  Kn  =  77” 1  its  inverse.  It  can  be 

proved,  [40;  p.  513],  that  the  (i,j)  entry  of  Kn  is 

n  +  i  —  l\  fn  +  j  —  l\  fi  -\-  j  —  2 

n  —  j  J  \  n  —  i  J  \  i  —  1 


where 


n ! 


is  the  standard  binomial  coefficient.  (Warning.  Proving  this 


k !  (n  —  k) ! 

formula  is  a  nontrivial  combinatorial  challenge.)  (a)  Write  down  the  inverse  of  the  Hilbert 
matrices  7L3,  7L4,  using  the  formula  or  the  Gauss-Jordan  Method  with  exact  rational 
arithmetic.  Check  your  results  by  multiplying  the  matrix  by  its  inverse. 

(b)  Recompute  the  inverses  on  your  computer  using  floating  point  arithmetic  and  compare 
with  the  exact  answers,  (c)  Try  using  floating  point  arithmetic  to  find  AT10  and  K2q-  Test 
the  answer  by  multiplying  the  Hilbert  matrix  by  its  computed  inverse. 

4b  1.7.24.  (a)  Write  out  a  pseudo-code  algorithm,  using  both  row  and  column  pointers,  for 

Gaussian  Elimination  with  Full  Pivoting,  (b)  Implement  your  code  on  a  computer,  and 
try  it  on  the  systems  in  Exercise  1.7.21. 
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So  far,  we  have  treated  only  linear  systems  involving  the  same  number  of  equations  as 
unknowns,  and  then  only  those  with  nonsingular  coefficient  matrices.  These  are  precisely 
the  systems  that  always  have  a  unique  solution.  We  now  turn  to  the  problem  of  solving  a 
general  linear  system  of  m  equations  in  n  unknowns.  The  cases  not  treated  as  yet  are  non¬ 
square  systems,  with  m  ^  n,  as  well  as  square  systems  with  singular  coefficient  matrices. 
The  basic  idea  underlying  the  Gaussian  Elimination  algorithm  for  nonsingular  systems  can 
be  straightforwardly  adapted  to  these  cases,  too.  One  systematically  applies  the  same  two 
types  of  elementary  row  operation  to  reduce  the  coefficient  matrix  to  a  simplified  form  that 
generalizes  the  upper  triangular  form  we  aimed  for  in  the  nonsingular  situation. 


Definition  1.38.  An  m  x  n  matrix  U  is  said  to  be  in  row 
following  “staircase”  structure: 


chelon  form  if  it  has  t 

* 

*  *  ...  * 

* 

*  *  ...  * 

* 

*  *  ...  * 

•  •  • 

0 

©  *  ...  * 

0 

0  0  ...  0 

0 

0  0  ...  0 

\ 


The  entries  indicated  by  ©  are  the  pivots ,  and  must  be  nonzero.  The  first  r  rows  of  U  each 
contain  exactly  one  pivot,  but  not  all  columns  are  required  to  include  a  pivot  entry.  The 
entries  below  the  “staircase” ,  indicated  by  the  solid  line,  are  all  zero,  while  the  non-pivot 
entries  above  the  staircase,  indicated  by  stars,  can  be  anything.  The  last  m  —  r  rows  are 
identically  zero,  and  do  not  contain  any  pivots.  There  may,  in  exceptional  situations,  be 
one  or  more  all  zero  initial  columns.  Here  is  an  explicit  example  of  a  matrix  in  row  echelon 
form:  /3  1  0  4  5  -7\ 

0-1-218  0 
0  0  0  0  2  -4' 

\0  0  000  0/ 

The  three  pivots  are  the  first  nonzero  entries  in  the  three  nonzero  rows,  namely,  3,  —1,  2. 


Slightly  more  generally,  U  may  have  several  initial  columns  consisting  of  all  zeros.  An 
example  is  the  row  echelon  matrix 


/O 

0 

3 

5 

-2 

°\ 

0 

0 

0 

0 

5 

3 

0 

0 

0 

0 

0 

-7 

Vo 

0 

0 

0 

0 

0/ 

which  also  has  three  pivots.  The  latter  matrix  corresponds  to  a  linear  system  in  which  the 
first  two  variables  do  not  appear  in  any  of  the  equations.  Thus,  such  row  echelon  forms 
almost  never  appear  in  applications. 


60 


1  Linear  Algebraic  Systems 


Proposition  1.39.  Every  matrix  can  be  reduced  to  row  echelon  form  by  a  sequence  of 
elementary  row  operations  of  types  #1  and  #2. 


In  matrix  language,  Proposition  1.39  implies  that  if  A  is  any  m  x  n  matrix,  then  there 
exists  an  m  x  m  permutation  matrix  P  and  an  m  x  m  lower  unitriangular  matrix  L  such 
that 

PA  =  LU ,  (1.73) 

where  U  is  an  m  x  n  row  echelon  matrix.  The  factorization  (1.73)  is  not  unique.  Observe 
that  P  and  L  are  square  matrices  of  the  same  size,  while  A  and  U  are  rectangular,  also  of 
the  same  size.  As  with  a  square  matrix,  the  entries  of  L  below  the  diagonal  correspond  to 
the  row  operations  of  type  #1,  while  P  keeps  track  of  row  interchanges.  As  before,  one 
can  keep  track  of  row  interchanges  with  a  row  pointer. 

A  constructive  proof  of  this  result  is  based  on  the  general  Gaussian  Elimination  algo¬ 
rithm,  which  proceeds  as  follows.  Starting  on  the  left  of  the  matrix,  one  searches  for  the 
first  column  that  is  not  identically  zero.  Any  of  the  nonzero  entries  in  that  column  may 
serve  as  the  pivot.  Partial  pivoting  indicates  that  it  is  probably  best  to  choose  the  largest 
one,  although  this  is  not  essential  for  the  algorithm  to  proceed.  One  places  the  chosen 
pivot  in  the  first  row  of  the  matrix  via  a  row  interchange,  if  necessary.  The  entries  below 
the  pivot  are  made  equal  to  zero  by  the  appropriate  elementary  row  operations  of  type  #1. 
One  then  proceeds  iteratively,  performing  the  same  reduction  algorithm  on  the  submatrix 
consisting  of  all  entries  strictly  to  the  right  and  below  the  pivot.  The  algorithm  terminates 
when  either  there  is  a  nonzero  pivot  in  the  last  row,  or  all  of  the  rows  lying  below  the  last 
pivot  are  identically  zero,  and  so  no  more  pivots  can  be  found. 

Example  1.40.  The  easiest  way  to  learn  the  general  Gaussian  Elimination  algorithm  is 
to  follow  through  an  illustrative  example.  Consider  the  linear  system 


x  +  3y  +  2  z  —  u  =  a, 

2x  +  6  y  +  £  +  Au  +  3  v  =  6, 
—  x  —  3y  —  3 z  P  3u  P  v  — 
3x  P  9y  P  8 z  —  7 u  P  2v  =  d. 


(1.74) 


of  4  equations  in  5  unknowns,  where  a,  6,  c,  d  are  given 


numbers^.  The  coefficient  matrix  is 


/  1  3  2  -1  0\ 

A  _  2  6  1  4  3 

A  ~  -l  -3  -3  3  1 

V  3  9  8  -7  2/ 

To  solve  the  system,  we  introduce  the  augmented  matrix 
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dJ 

(1.75) 


obtained  by  appending  the  right-hand  side  of  the  system.  The  upper  left  entry  is  nonzero, 
and  so  can  serve  as  the  first  pivot.  We  eliminate  the  entries  below  it  by  elementary  row 


1  It  will  be  convenient  to  work  with  the  right-hand  side  in  general  form,  although  the  reader 
may  prefer,  at  least  initially,  to  assign  numerical  values  to  a,  6,  c,  d. 
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operations,  resulting  in 
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Now,  the  second  column  contains  no  suitable  nonzero  entry  to  serve  as  the  second  pivot. 
(The  top  entry  already  lies  in  a  row  containing  a  pivot,  and  so  cannot  be  used.)  Therefore, 
we  move  on  to  the  third  column,  choosing  the  (2,  3)  entry,  —3,  as  our  second  pivot.  Again, 
we  eliminate  the  entries  below  it,  leading  to 
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The  fourth  column  has  no  pivot  candidates,  and  so  the  final  pivot  is  the  4  in  the  fifth 
column.  We  interchange  the  last  two  rows  in  order  to  place  the  coefficient  matrix  in  row 
echelon  form: 
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(1.76) 


There  are  three  pivots,  1,-3,  and  4,  sitting  in  positions  (1,1),  (2,3),  and  (3,5).  Note 
the  staircase  form,  with  the  pivots  on  the  steps  and  everything  below  the  staircase  being 
zero.  Recalling  the  row  operations  used  to  construct  the  solution  (and  keeping  in  mind 
that  the  row  interchange  that  appears  at  the  end  also  affects  the  entries  of  L),  we  find  the 
factorization  (1.73)  takes  the  explicit  form 


/I 

0 

0 

°\ 

/ 

1 

3 

2 

-1 

°\ 

( 

1 

0 

0 

°v 

/l 

3 

2 

-1 

°\ 

0 

1 

0 

0 

2 

6 

1 

4 

3 

2 

1 

0 

0 

0 

0 

-3 

6 

3 

0 

0 

0 

1 

-1 

-3 

-3 

3 

1 

3 

2 

3 

i 

0 

0 

0 

0 

0 

4 

Vo 

0 

1 

0/ 

3 

9 

8 

-7 

2/ 

-1 

1 

3 

0 

l) 

Vo 

0 

0 

0 

0  / 

We  shall  return  to  find  the  solution  to  our  linear  system  after  a  brief  theoretical  interlude. 


Warning.  In  the  augmented  matrix,  pivots  can  never  appear  in  the  last  column,  repre- 
senting  the  right-hand  side  of  the  system.  Thus,  even  ifc— that  entry  does 
not  qualify  as  a  pivot. 

We  now  introduce  the  most  important  numerical  quantity  associated  with  a  matrix. 


Definition  1.41.  The  rank  of  a  matrix  is  the  number  of  pivots. 


For  instance,  the  rank  of  the  matrix  (1.75)  equals  3,  since  its  reduced  row  echelon  form, 
i.e.,  the  first  five  columns  of  (1.76),  has  three  pivots.  Since  there  is  at  most  one  pivot  per 
row  and  one  pivot  per  column,  the  rank  of  an  m  x  n  matrix  is  bounded  by  both  m  and  n, 
and  so 

0  <r  —  rank  A  <  min  {m,n}.  (1*77) 

The  only  m  x  n  matrix  of  rank  0  is  the  zero  matrix  O  —  which  is  the  only  matrix  without 
any  pivots. 
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Proposition  1.42.  A  square  matrix  of  size  n  x  n  is  nonsingular  if  and  only  if  its  rank  is 
equal  to  n. 

Indeed,  the  only  way  an  n  x  n  matrix  can  end  up  having  n  pivots  is  if  its  reduced  row 
echelon  form  is  upper  triangular  with  nonzero  diagonal  entries.  But  a  matrix  that  reduces 
to  such  triangular  form  is,  by  definition,  nonsingular. 

Interestingly,  the  rank  of  a  matrix  does  not  depend  on  which  elementary  row  operations 
are  performed  along  the  way  to  row  echelon  form.  Indeed,  performing  a  different  sequence  of 
row  operations  —  say  using  Partial  Pivoting  versus  no  pivoting  —  can  produce  a  completely 
different  reduced  form.  The  remarkable  result  is  that  all  such  row  echelon  forms  end  up 
having  exactly  the  same  number  of  pivots,  and  this  number  is  the  rank  of  the  matrix.  A 
formal  proof  of  this  fact  will  appear  in  Chapter  2;  see  Theorem  2.49. 

Once  the  coefficient  matrix  has  been  reduced  to  row  echelon  form  ( U  |  c ) ,  the  solution 
to  the  equivalent  linear  system  kx  =  c  proceeds  as  follows.  The  first  step  is  to  see  whether 
there  are  any  equations  that  do  not  have  a  solution.  Suppose  one  of  the  rows  in  the 
echelon  form  U  is  identically  zero,  but  the  corresponding  entry  in  the  last  column  c  of 
the  augmented  matrix  is  nonzero.  What  linear  equation  would  this  represent?  Well,  the 
coefficients  of  all  the  variables  are  zero,  and  so  the  equation  is  of  the  form 

0  =  ct,  (1.78) 

where  i  is  the  row’s  index.  If  ci  ^  0,  then  the  equation  cannot  be  satisfied  —  it  is 
inconsistent.  The  reduced  system  does  not  have  a  solution.  Since  the  reduced  system  was 
obtained  by  elementary  row  operations,  the  original  linear  system  is  incompatible ,  meaning 
it  also  has  no  solutions.  Note:  It  takes  only  one  inconsistency  to  render  the  entire  system 
incompatible.  On  the  other  hand,  if  ci  —  0,  so  the  entire  row  in  the  augmented  matrix  is 
zero,  then  (1.78)  is  merely  0  =  0,  and  is  trivially  satisfied.  Such  all-zero  rows  do  not  affect 
the  solvability  of  the  system. 

In  our  example,  the  last  row  in  the  echelon  form  (1.76)  is  all  zero,  and  hence  the 
last  entry  in  the  final  column  must  also  vanish  in  order  that  the  system  be  compatible. 
Therefore,  the  linear  system  (1.74)  will  have  a  solution  if  and  only  if  the  right-hand  sides 
a,  6,  c,  d  satisfy  the  linear  constraint 

|  a  —  |  6  +  c  =  0.  (1-79) 

In  general,  if  the  system  is  incompatible,  there  is  nothing  else  to  do.  Otherwise,  every 
all  zero  row  in  the  row  echelon  form  of  the  coefficient  matrix  also  has  a  zero  entry  in  the 
last  column  of  the  augmented  matrix;  the  system  is  compatible  and  admits  one  or  more 
solutions.  (If  there  are  no  all-zero  rows  in  the  coefficient  matrix,  meaning  that  every  row 
contains  a  pivot,  then  the  system  is  automatically  compatible.)  To  find  the  solution(s),  we 
split  the  variables  in  the  system  into  two  classes. 

Definition  1.43.  In  a  linear  system  U x  =  c  in  row  echelon  form,  the  variables  cor¬ 
responding  to  columns  containing  a  pivot  are  called  basic  variables ,  while  the  variables 
corresponding  to  the  columns  without  a  pivot  are  called  free  variables. 

The  solution  to  the  system  then  proceeds  by  an  adaptation  of  the  Back  Substitution 
procedure.  Working  in  reverse  order,  each  nonzero  equation  is  solved  for  the  basic  variable 
associated  with  its  pivot.  The  result  is  substituted  into  the  preceding  equations  before 
they  in  turn  are  solved.  The  solution  then  specifies  all  the  basic  variables  as  certain 
combinations  of  the  remaining  free  variables.  As  their  name  indicates,  the  free  variables,  if 
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any,  are  allowed  to  take  on  any  values  whatsoever,  and  so  serve  to  parameterize  the  general 
solution  to  the  system. 

Example  1.44.  Let  us  illustrate  the  solution  procedure  with  our  particular  system 

(1.74).  The  values  a  =  0,  6  =  3,  c—  1,  d  —  1,  satisfy  the  consistency  constraint  (1.79),  and 
the  corresponding  reduced  augmented  matrix  (1.76)  is 


1 

3 

2 

-1 

0 

°\ 

0 

0 

-3 

6 

3 

3 

0 

0 

0 

0 

4 

3 

Vo 

0 

0 

0 

0 

0/ 

The  pivots  are  found  in  columns  1,3,5,  and  so  the  corresponding  variables,  x,z,v,  are 
basic;  the  other  variables,  y,  u ,  corresponding  to  the  non-pivot  columns  2,  4,  are  free.  Our 
task  is  to  solve  the  reduced  system 

x  T  3  y  T  2  z  —  u  =0, 

—  3z  +  6u  +  3  v  =  3, 

4:V  =  3, 

0  =  0, 

for  the  basic  variables  x,  2,  v  in  terms  of  the  free  variables  y:  u.  As  before,  this  is  done  in  the 
reverse  order,  by  successively  substituting  the  resulting  values  in  the  preceding  equation. 
The  result  is  the  general  solution 

v  =  |,  z  =  —  1  +  2u  +  v  =  —  |  +  2rq  x  =  —  3y  —  2z  +  u  =  \  —  3y  —  3u. 

The  free  variables  y ,  u  remain  completely  arbitrary;  any  assigned  values  will  produce  a 
solution  to  the  original  system.  For  instance,  if  y  =  —l,u  =  i r,  then  x  —  |  —  37r, 
z  =  — |+27T,  v  =  |.  But  keep  in  mind  that  this  is  merely  one  of  an  infinite  number 
of  valid  solutions. 

In  general,  if  the  mxn  coefficient  matrix  of  a  system  of  m  linear  equations  in  n  unknowns 
has  rank  r,  there  are  m  —  r  all-zero  rows  in  the  row  echelon  form,  and  these  m  —  r  equations 
must  have  zero  right-hand  side  in  order  that  the  system  be  compatible  and  have  a  solution. 
Moreover,  there  is  a  total  ofr  basic  variables  and  n  —  r  free  variables,  and  so  the  general 
solution  depends  upon  n  —  r  parameters. 

Summarizing  the  preceding  discussion,  we  have  learned  that  there  are  only  three  possible 
outcomes  for  the  solution  to  a  system  of  linear  equations. 

Theorem  1.45.  A  system  Ax  =  b  of  m  linear  equations  in  n  unknowns  has  either 
(z)  exactly  one  solution,  (n)  infinitely  many  solutions,  or  (in)  no  solution. 

Case  (in)  occurs  if  the  system  is  incompatible,  producing  a  zero  row  in  the  echelon  form 
that  has  a  nonzero  right-hand  side.  Case  (ii)  occurs  if  the  system  is  compatible  and  there 
are  one  or  more  free  variables,  and  so  the  rank  of  the  coefficient  matrix  is  strictly  less  than 
the  number  of  columns:  r  <  n.  Case  (i)  occurs  for  nonsingular  square  coefficient  matrices, 
and,  more  generally,  for  compatible  systems  for  which  r  —  n,  implying  there  are  no  free 
variables.  Since  r  <  m,  this  case  can  arise  only  if  the  coefficient  matrix  has  at  least  as 
many  rows  as  columns,  i.e.,  the  linear  system  has  at  least  as  many  equations  as  unknowns. 
A  linear  system  can  never  have  a  finite  number  —  other  than  0  or  1  —  of  solutions.  As 
a  consequence,  any  linear  system  that  admits  two  or  more  solutions  automatically  has 
infinitely  many! 
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Unique  Solution  Infinitely  Many  Solutions 


Figure  1.1.  Intersecting  Planes. 


Warning.  This  property  requires  linearity,  and  is  not  valid  for  nonlinear  systems.  For 
instance,  the  real  quadratic  equation  x2  +  x  —  2  =  0  has  exactly  two  real  solutions:  x  —  1 
and  x  =  —2. 


Example  1.46.  Consider  the  linear  system 

y  +  4:Z  =  a ,  3x  —  y  +  2z  =  6,  x  +  y  +  6z  =  c , 


consisting  of  three  equations  in  three  unknowns. 


The  augmented  coefficient  matrix  is 


0 

3 

1 


Interchanging  the  first  two  rows,  and  then  eliminating  the  elements  below  the  first  pivot 
leads  to 


The  second  pivot  is  in  the  (2,  2)  position,  but  after  eliminating  the  entry  below  it,  we  find 
the  row  echelon  form  to  be 


/  3  -1  2 

0  14 

\0  0  0 


Since  there  is  a  row  of  all  zeros,  the  original  coefficient  matrix  is  singular,  and  its  rank  is 
only  2. 

The  consistency  condition  follows  from  this  last  row  in  the  reduced  echelon  form,  which 
requires 

(l  -\-  7^  b  —  c  —  0. 

If  this  is  not  satisfied,  the  system  has  no  solutions;  otherwise,  it  has  infinitely  many.  The 
free  variable  is  z,  since  there  is  no  pivot  in  the  third  column.  The  general  solution  is 


y  —  a  —  4z, 


x  —  1 6  T  \  y  ~  \  z  —  \a  +  \  b  ~  2 2, 


where  z  is  arbitrary. 
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Geometrically,  Theorem  1.45  is  telling  us  about  the  possible  configurations  of  linear 
subsets  (lines,  planes,  etc.)  of  an  n-dimensional  space.  For  example,  a  single  linear  equation 
ax  +  by  +  cz  =  d,  with  (a,  5,  c)  ^  0,  defines  a  plane  P  in  three-dimensional  space.  The 
solutions  to  a  system  of  three  linear  equations  in  three  unknowns  belong  to  all  three  planes; 
that  is,  they  he  in  their  intersection  P1  D  P2  H  P3.  Generically,  three  planes  intersect  in 
a  single  common  point;  this  is  case  (i)  of  the  theorem,  which  occurs  if  and  only  if  the 
coefficient  matrix  is  nonsingular.  The  case  of  infinitely  many  solutions  occurs  when  the 
three  planes  intersect  in  a  common  line,  or,  even  more  degenerately,  when  they  all  coincide. 
On  the  other  hand,  parallel  planes,  or  planes  intersecting  in  parallel  lines,  have  no  common 
point  of  intersection,  and  this  occurs  when  the  system  is  incompatible  and  has  no  solutions. 
There  are  no  other  possibilities:  the  total  number  of  points  in  the  intersection  is  either  0, 
1,  or  oc.  Some  sample  geometric  configurations  appear  in  Figure  1.1. 


Exercises 


1.8.1.  Which  of  the  following  systems  has  (i)  a  unique  solution?  (ii)  infinitely  many 

x-2y  =  l, 

3x  +  2y  =  —3. 
2y  +  2  =  6, 


solutions?  (Hi)  no  solution?  In  each  case,  find  all  solutions:  (a) 


(b) 


2x  +  y  +  3  z  =  1, 
x  +  4y  —  2z  =  —3. 


x  +  y  —  2z  =  —3, 


x 


x  —  2y-\~2z  —  w  =  3, 
(e)  3x  +  y  +  6z  +  llw  =  16, 
2x  —  y  +  4z  +  w  =  9. 


(c)  2x  —  y  +  3z  =  7,  (d)  2x  +  y  —  3z  =  —  3, 

x-2y-\-hz  =  l.  x  —  3y  +  3z  =  10. 

3x  —  2y  4~  z  =  4,  x  4~  2y  +  172;  —  5w  =  50, 


(f) 


(g) 


9x  —  1 6y  +  10  z  —  8w  =  24, 


x  +  3y  —  Az  =  —3, 

2x  —  3y  +  5z  =  7,  vo/  2x  —  5y  —  4z  =  —13, 

x  —  8y  -j-  9 z  =  10.  6x  —  12y  +  z  —  4w  =  —  1. 

1.8.2.  Determine  if  the  following  systems  are  compatible  and,  if  so,  find  the  general  solution: 

x1  +  2x2  =  1, 


6x-|  +  3x9  =  12,  8xi  +  12x9  =  16,  2x-,  —  6x9  +  4xo  =  2, 

(a)  Ax1+2x2  =  9.  ^  6xj  +9*2  =  13.  (c)  2xi  +  5x2  ~  2>  (d)  _ x  +  3x  2x3  = -1. 

12  12  3x  +6*2  =3.  12  3 


2x^  +  2x2  +  3x3  =  1,  x1  +  x2  +  x3  +  9x4  =  8, 

(e)  £3  +  22:3=3,  (f)  £3  +  22:3  +  82:4  =  7,  (g) 

4x 1  +  5x2  +  7x3  =  15.  —  3x1  +  x3  —  7x4  =  9. 


x1  +  2x2  +  3x3  +  4x4  =  1, 
2xj^  +  4x2  +  6x3  +  5x4  =  0, 
3x1  +  4x2  +  x3  +  x4  =  0, 
4x 1  +  6x2  +  4x3  —  x4  =  0. 

1.8.3.  Graph  the  following  planes  and  determine  whether  they  have  a  common  intersection: 

x -\- y  +  z  =  1,  x  +  y  =  1,  x  +  z  =  l. 


be  the  augmented  matrix  for  a  linear  system.  For  which 


values  of  a  and  b  does  the  system  have  (i)  a  unique  solution?  (ii)  infinitely  many 
solutions?  (Hi)  no  solution? 

1.8.5.  Determine  the  general  (complex)  solution  to  the  following  systems: 

x  +  2iy  +  (2-4i)z  =  5  +  5i, 


(  a 

0 

b 

2  \ 

1.8.4.  Let  A  = 

a 

2 

a 

b 

u 

2 

a 

a  / 

(a) 


2£  +  (l+  i)y-2iz  =  2i 
(1—  \)x  -\-  y  —  2\z  =  0. 


(b)  (-1  +  i  )x  4-  2y  +  (4  +  2  i  )z  =  0, 

(1—  i)x+(l  +  4i)y  —  5i2;  =  10  +  5i. 
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(c) 


x1  +  i  x2  +  x3  =  1  +  4  i , 

xl  +  x2  _  1X3  =  ~ 
ix1  —  x2  —  x3  =  —1  —  2  i . 


(2  +  i  )x  +  ir/+(2  +  2i)z  +  (l  +  12i  )w  —  0, 
(d)  (1  -  i  )x  +  y  +  (2  -  i  )z  +  (8  +  2  i  )ie  =  0, 

(3  +  2i)x  +  iy  +  (3  +  3i)z  +  19  ire  =  0. 


1.8.6.  For  which  values  of  b  and  c  does  the  system  x1  +  x2  +  6x3  =  1,  baq  +  3x2  —  x3  =  —2, 
3#-^  +4x2  +  x3  =  c,  have  (a)  no  solution?  (b)  exactly  one  solution?  (c)  infinitely  many 
solutions? 

A  1 

1  -2 


1.8.7.  Determine  the  rank  of  the  following  matrices:  (a) 


O) 


2 

-2 


1 

-1 


3 

-3 


i  -l  i\ 

2  -1  0\ 

(  3  \ 

(c) 

1-12 

,  (d) 

2  -1  1 

>  (e) 

0 

V-1  1  o) 

Vi  i  -o 

(f)  (0  -1  2  5), 


(s) 

/  0  -3\ 

4  -1 

1  2 

.  (*) 

W  -5 ) 

/I 
2 
1 
4 
VO 


1 

1 

2 

1 

3 


2 

-1 

-3 

3 

-5 


1\ 

0 

-1 
2 

-2  ) 


(i) 


/0  0 
1  2 
\  2  4 


0 

-3 

-2 


3 

1 

1 


1\ 

-2 

-2 


1.8.8.  Write  out  a  PA  =  LU  factorization  for  each  of  the  matrices  in  Exercise  1.8.7. 

1.8.9.  Construct  a  system  of  three  linear  equations  in  three  unknowns  that  has 
(a)  one  and  only  one  solution;  (b)  more  than  one  solution;  (c)  no  solution. 

1.8.10.  Find  a  coefficient  matrix  A  such  that  the  associated  linear  system  Ax  =  b  has 
(a)  infinitely  many  solutions  for  every  b;  (b)  0  or  oo  solutions,  depending  on  b; 

(c)  0  or  1  solution  depending  on  b;  (d)  exactly  1  solution  for  all  b. 

1.8.11.  Give  an  example  of  a  nonlinear  system  of  two  equations  in  two  unknowns  that  has 
(a)  no  solution;  (b)  exactly  two  solutions;  (c)  exactly  three  solutions;  (d)  infinitely 
many  solutions. 

1.8.12.  What  does  it  mean  if  a  linear  system  has  a  coefficient  matrix  with  a  column  of  all  0’s? 

1.8.13.  True  or  false:  One  can  find  an  m  x  n  matrix  of  rank  r  for  every  0  <  r  <  min  {m,  n}. 

1.8.14.  True  or  false:  Every  m  x  n  matrix  has  (a)  exactly  m  pivots;  (b)  at  least  one  pivot. 

rp 

O  1.8.15.  (a)  Prove  that  the  product  A  =  v  w  of  a  nonzero  mxl  column  vector  v  and  a  nonzero 

m 

1  x  n  row  vector  w  is  an  m  x  n  matrix  of  rank  r  =  1.  (b)  Compute  the  following  rank  one 

/  4  \ 


products:  (i) 


1 

3 


(-1  2),  (it) 


(-2  1),  (iii) 


2 

-3 


(1  3  -1). 


0 

V-V 

rp 

(c)  Prove  that  every  rank  one  matrix  can  be  written  in  the  form  A  =  v  w  . 


A  1.8.16.  (a)  Let  A  be  an  m  x  n  matrix  and  let  M  =  (  A  \  b)  be  the  augmented  matrix  for  the 

linear  system  Ax  =  b.  Show  that  either  (i)  rank  A  =  rank  M,  or  (ii)  rank  A  =  rank  M  —  1. 
(b)  Prove  that  the  system  is  compatible  if  and  only  if  case  (i)  holds. 


/ 


1.8.17.  Find  the  rank  of  the  matrix 


a 

ar 


n 


ar 


ar 

n+l 


\  ar(n— 1)n 

( 


ar 


(n  — l)n+l 


arn  \ 
ar271-1 


a  rn  1 


when  a,r/0. 


1 

n  +  l 
2n  +  1 


\  nz  —  n  +  1 


2 

n  +  2 
2n  +  2 

n2  —  n  +  2 


3 

n  +  3 
2n  +  3 


n  \ 

2  n 

3  n 


n2  J 


1.8.18.  Find  the  rank  of  the  n  x  n  matrix 
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1.8.19.  Find  two  matrices  A,  B  such  that  rank  A  B  ^  rank  B  A. 

0  1.8.20.  Let  A  be  an  m  x  n  matrix  of  rank  r.  (a)  Suppose  C  =  (A  B)  is  an  m  x  k  matrix,  k  >  n. 
whose  first  n  columns  are  the  same  as  the  columns  of  A.  Prove  that  rank  C  >  rank  A.  Give 


an 


example  with  rank  C  =  rank  A;  with  rank  C  >  rank  A.  ( b )  Let  E  =  be  a  j 


x  n 


matrix,  j  >  m,  whose  first  m  rows  are  the  same  as  those  of  A.  Prove  that  rankE  >  rank  A. 
Give  an  example  with  rankE  =  rank  A;  with  rankE  >  rank  A. 

0  1.8.21.  Let  A  be  a  singular  square  matrix.  Prove  that  there  exist  elementary  matrices  El5 . . . ,  EN 
such  that  A  =  E1  E2  •  •  •  EN  Z,  where  Z  is  a  matrix  with  at  least  one  all-zero  row. 


Homogeneous  Systems 

A  linear  system  with  all  0’s  on  the  right-hand  side  is  called  homogeneous.  Conversely,  if 
at  least  one  of  the  right-hand  sides  is  nonzero,  the  system  is  called  inhomogeneous. 

In  matrix  notation,  a  homogeneous  system  takes  the  form 


ix  =  0,  (1.80) 

where  the  zero  vector  0  indicates  that  every  entry  on  the  right-hand  side  is  zero.  Homo¬ 
geneous  systems  are  always  compatible,  since  x  =  0  is  a  solution,  known  as  the  trivial 
solution.  If  a  homogeneous  system  has  a  nontrivial  solution  x  /  0,  then  Theorem  1.45 
assures  us  that  it  must  have  infinitely  many  solutions.  This  will  occur  if  and  only  if  the 
reduced  system  has  one  or  more  free  variables. 

Theorem  1.47.  A  homogeneous  linear  system  Ax  =  0  of  m  equations  in  n  unknowns 
has  a  nontrivial  solution  x  ^  0  if  and  only  if  the  rank  of  A  is  r  <  n.  If  m  <  n,  the  system 
always  has  a  nontrivial  solution.  If  m  =  n,  the  system  has  a  nontrivial  solution  if  and  only 
if  A  is  singular. 

Thus,  homogeneous  systems  with  fewer  equations  than  unknowns  always  have  infinitely 
many  solutions.  Indeed,  the  coefficient  matrix  of  such  a  system  has  more  columns  than 
rows,  and  so  at  least  one  column  cannot  contain  a  pivot,  meaning  that  there  is  at  least  one 
free  variable  in  the  general  solution  formula. 

Example  1.48.  Consider  the  homogeneous  linear  system 

2x-^  T  x2  T  5 x4  —  0, 

4x1  +  2x2  —  x3  +  8x4  =  0, 

—  2x1  —  x2  +  3x3  —  4x4  =  0, 

with  coefficient  matrix 

/  2  1  0  5\ 

A  =  14  2-1  8  . 

\  —2  -1  3-4 / 

Since  there  are  only  three  equations  in  four  unknowns,  we  already  know  that  the  system 
has  infinitely  many  solutions,  including  the  trivial  solution  tjj  ^  2  tZ/  ^  ^  0  • 

When  solving  a  homogeneous  system,  the  final  column  of  the  augmented  matrix  consists 
of  all  zeros.  As  such,  it  will  never  be  altered  by  row  operations,  and  so  it  is  a  waste  of 
effort  to  carry  it  along  during  the  process.  We  therefore  perform  the  Gaussian  Elimination 
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algorithm  directly  on  the  coefficient  matrix  A.  Working  with  the  (1,1)  entry  as  the  first 
pivot,  we  first  obtain 

/  2  1  0  5  \ 

0  0-1-2. 

\0  0  3  1/ 

The  (2,3)  entry  is  the  second  pivot,  and  we  apply  one  final  row  operation  to  place  the 
matrix  in  row  echelon  form 

/  2  1  0  5  \ 

0  0-1-2. 

\0  0  0-5/ 

This  corresponds  to  the  reduced  homogeneous  system 

2x1  +  x2  +  5x4  =  0,  —  x3  —  2  rc4  =  0 ,  —  5.x4  =  0. 

Since  there  are  three  pivots  in  the  final  row  echelon  form,  the  rank  of  the  coefficient  matrix 
A  is  3.  There  is  one  free  variable,  namely  x2.  Using  Back  Substitution,  we  easily  obtain 
the  general  solution 

xx  =  —  1 1,  x2  —  £,  x3  =  x4  =  0, 

which  depends  upon  a  single  free  parameter  t  —  x2. 


Example  1.49.  Consider  the  homogeneous  linear  system 


2x  —  y  -\-  3z  =  0, 
Ax  A  2y  —  6z  =  0, 
2x  —  y  A  z  —  0, 
6x  —  3y  +  3z  =  0, 


with  coefficient  matrix 


A  = 


(  2 

-4 


2  -1 

V  6  -3 


1  3\ 

2  -6 

1 

3/ 


The  system  admits  the  trivial  solution  x  —  y  —  z  —  0,  but  in  this  case  we  need  to  complete 
the  elimination  algorithm  before  we  know  for  sure  whether  there  are  other  solutions.  After 


the  first  stage  in  the  reduction  process,  the  coefficient  matrix  becomes 


the  final  pivot  position;  after  that,  the  reduction  to  the  row  echelon  form 


is  immediate.  Thus,  the  system  reduces  to  the  equations 

2x  —  y  AAz  =  0,  —2z  =  0.  0  =  0, 


2 

-1 

3\ 

0 

0 

0 

0 

0 

-2 

• 

Vo 

0 

-6/ 

nonzero  entry  in 

(2 

-1 

3\ 

0 

0 

-2 

0 

0 

0 

Vo 

0 

0/ 

0  =  0. 


The  third  and  fourth  equations  are  trivially  compatible,  as  they  must  be  in  the  homo¬ 
geneous  case.  The  rank  of  the  coefficient  matrix  is  equal  to  two,  which  is  less  than  the 
number  of  columns,  and  so,  even  though  the  system  has  more  equations  than  unknowns, 
it  has  infinitely  many  solutions.  These  can  be  written  in  terms  of  the  free  variable  y\  the 
general  solution  is  x  —  z  =  0,  where  y  is  arbitrary. 
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Exercises 


(a) 


x  -\-  y  —  2^  =  0, 
x  +  4y  —  3z  =  0. 


1.8.22.  Solve  the  following  homogeneous  linear  systems. 

2x-\-3y  —  z  =  0,  —  x -\- y  —  4z  =  0, 

(b)  —  4x  +  3y  —  5z  =  0,  (c)  —  2x  +  2y  —  6z  =  0, 

x  —  3y -f  3z  =  0.  x  +  3y  +  3£  =  0. 

—  y  +  z  =  0, 

—  x-\-3y  —  2z4~w  =  0, 

x  +  2y-2z  +  w  =  0,  2x  —  3w  =  0, 

(e)  —  2x  +  oy  +  z  —  2w  =  0,  (f) 

—  3x  +  z  —  2w  =  0.  y  x  +  y  —  2  w  =  0, 

3x  —  8u  +  £  —  4ie  =  0. 

y  —  3  z  +  w  =  0. 

1.8.23.  Find  all  solutions  to  the  homogeneous  system  Ax  =  0  for  the  coefficient  matrix 


( d ) 


(a) 


(e) 


/ 

V 


3 

-9 

0 
2 
1 


1 

3 


( b ) 


2 

3 


1 

1 


4 

2 


(c) 


(0 


/I  —2  \ 
1  -1 
2  -1 
\1  0  J 


(&) 


1 

2 

/  1 
-1 
4 

V-1 


2  3 
1  4 


3 

0 


2  0\ 
3  2 

7  2 

i  6  y 


(l 

2 

3  \ 

,  (d) 

4 

5 

6 

8 

9) 

/ 

0 

0 

3 

( h ) 

1 

2 

-1 

-2 

0 

1 

v- 

-1 

1 

1 

-3\ 
3 
5 

-4  y 


1.8.24.  Let  6/  be  an  upper  triangular  matrix.  Show  that  the  homogeneous  system  U  x  =  0 
admits  a  nontrivial  solution  if  and  only  if  U  has  at  least  one  0  on  its  diagonal. 

1.8.25.  Find  the  solution  to  the  homogeneous  system  2x^  +  x2  —  2x3  =0,  2x1  —  x2  —  2x3  =  0. 
Then  solve  the  inhomogeneous  version  where  the  right-hand  sides  are  changed  to  a,  6, 
respectively.  What  do  you  observe? 

1.8.26.  Answer  Exercise  1.8.25  for  the  system  2x-l  +x2  +  x3  —  x4  =  0,  2x-l  —  2x2  —  x3  +  3x4  =  0. 

1.8.27.  Find  all  values  of  k  for  which  the  following  homogeneous  systems  of  linear  equations 

have  a  non-trivial  solution:  x  +  ky-\-2z  =  0 

Xi  +  kx0  +  4xo  =  0, 

(»)  I  +  't»  =  0'  (b|  fcll  +  *,  +  2*3  =  0,  (c),  = 

kx  +  iy  =  0,  2*  +kx  +Sx  o.  (k  +  l)z  —  2y  —  As  =  0, 

kx  +  3y  +  6z  =  0. 


1.9  Determinants 

You  may  be  surprised  that,  so  far,  we  have  not  mentioned  determinants  —  a  topic  that 
typically  assumes  a  central  role  in  many  treatments  of  basic  linear  algebra.  Determinants 
can  be  useful  in  low-dimensional  and  highly  structured  problems,  and  have  many  fascinat¬ 
ing  properties.  They  also  prominently  feature  in  theoretical  developments  of  the  subject. 
But,  like  matrix  inverses,  they  are  almost  completely  irrelevant  when  it  comes  to  large 
scale  applications  and  practical  computations.  Indeed,  for  most  matrices,  the  best  way  to 
compute  a  determinant  is  (surprise)  Gaussian  Elimination!  Consequently,  from  a  computa¬ 
tional  standpoint,  the  determinant  adds  no  new  information  concerning  the  linear  system 
and  its  solutions.  However,  for  completeness  and  in  preparation  for  certain  later  develop¬ 
ments  (particularly  computing  eigenvalues  of  small  matrices),  you  should  be  familiar  with 
the  basic  facts  and  properties  of  determinants,  as  summarized  in  this  final  section. 
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The  determinant  of  a  square  matrix^  A  is  a  scalar,  written  det  A,  that  will  distinguish 
between  singular  and  nonsingular  matrices.  We  already  encountered  in  (1.38)  the  determi- 

"j  =  ad  —  be.  The  key  fact  is  that  the  determinant  is 

nonzero  if  and  only  if  the  matrix  has  an  inverse,  or,  equivalently,  is  nonsingular.  Our  goal 
is  to  find  an  analogous  quantity  for  general  square  matrices. 

There  are  many  different  ways  to  define  determinants.  The  difficulty  is  that  the  ac¬ 
tual  formula  is  very  unwieldy  —  see  (1.87)  below  —  and  not  well  motivated.  We  prefer 
an  axiomatic  approach  that  explains  how  our  three  elementary  row  operations  affect  the 
determinant. 

Theorem  1.50.  Associated  with  every  square  matrix,  there  exists  a  uniquely  defined 
scalar  quantity,  known  as  its  determinant ,  that  obeys  the  following  axioms: 

(i)  Adding  a  multiple  of  one  row  to  another  does  not  change  the  determinant. 

(ii)  Interchanging  two  rows  changes  the  sign  of  the  determinant. 

(in)  Multiplying  a  row  by  any  scalar  (including  zero)  multiplies  the  determinant  by  the 
same  scalar. 

( iv )  The  determinant  of  an  upper  triangular  matrix  U  is  equal  to  the  product  of  its 
diagonal  entries:  det  U  —  u11u2 2  •  •  •  unn. 

In  particular,  axiom  (iv)  implies  that  the  determinant  of  the  identity  matrix  is 

det  1=1.  (1.81) 

Checking  that  all  four  of  these  axioms  hold  in  the  2x2  case  is  an  elementary  exercise. 

The  proof  of  Theorem  1.50  is  based  on  the  following  results.  Suppose,  in  particular,  we 
multiply  a  row  of  the  matrix  A  by  the  zero  scalar.  The  resulting  matrix  has  a  row  of  all 
zeros,  and,  by  axiom  (iii),  has  zero  determinant.  Since  any  matrix  with  a  zero  row  can  be 
obtained  in  this  fashion,  we  conclude: 

Lemma  1.51.  Any  matrix  with  one  or  more  all-zero  rows  has  zero  determinant. 


nant  of  a  2  x  2  matrix^:  det 


Using  these  properties,  one  is  able  to  compute  the  determinant  of  any  square  matrix 
by  Gaussian  Elimination,  which  is,  in  fact,  the  fastest  and  most  practical  computational 
method  in  all  but  the  simplest  situations. 

Theorem  1.52.  If  A  =  LU  is  a  regular  matrix,  then 

det  A  =  det  U  =  u11u2  2  •••  unn  (1.82) 

equals  the  product  of  the  pivots.  More  generally,  if  A  is  nonsingular,  and  requires  k  row 
interchanges  to  arrive  at  its  permuted  factorization  PA  —  LU ,  then 

det  A  =  det  P  det  U  =  (— l)k  uxlu22  •••  unn.  (1.83) 

Finally,  A  is  singular  if  and  only  if 

det  A  =  0.  (1*84) 


a  b 
c  d 


^  Non-square  matrices  do  not  have  determinants. 

Some  authors  use  vertical  lines  to  indicate  the  determinant: 
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Proof :  In  the  regular  case,  we  need  only  elementary  row  operations  of  type  #1  to  reduce 
A  to  upper  triangular  form  £7,  and  axiom  (i)  says  these  do  not  change  the  determinant. 
Therefore,  det  A  =  det  U,  the  formula  for  the  latter  being  given  by  axiom  (iv).  The 
nonsingular  case  follows  in  a  similar  fashion.  By  axiom  (ii),  each  row  interchange  changes 
the  sign  of  the  determinant,  and  so  det  A  equals  det  U  if  there  has  been  an  even  number  of 
interchanges,  but  equals  —  det  U  if  there  has  been  an  odd  number.  For  the  same  reason, 
the  determinant  of  the  permutation  matrix  P  equals  +1  if  there  has  been  an  even  number 
of  row  interchanges,  and  —1  for  an  odd  number.  Finally,  if  A  is  singular,  then  we  can 
reduce  it  to  a  matrix  with  at  least  one  row  of  zeros  by  elementary  row  operations  of  types 
and  #2.  Lemma  1.51  implies  that  the  resulting  matrix  has  zero  determinant,  and  so 
det  A  =  0,  also.  Q.E.D. 

Remark.  If  we  then  apply  Gauss-Jordan  elimination  to  reduce  the  upper  triangular 
matrix  U  to  the  identity  matrix  I ,  and  use  axiom  (ii)  when  each  row  is  divided  by  its 
pivot,  we  find  that  axiom  (iv)  follows  from  the  simpler  formula  (1.81),  which  could  thus 
replace  it  in  Theorem  1.50. 


Example  1.53.  Let  us  compute  the  determinant  of  the  4x4  matrix 

/I  0  -1  2  \ 

A  =  2  -3  4 

0  2  -2  3' 

Vl  1  -4  -2/ 

We  perform  our  usual  Gaussian  Elimination  algorithm,  successively  leading  to  the  matrices 


(l 

0 

-1 

2\ 

(l 

0 

-1 

2\ 

(l 

0 

-1 

2\ 

0 

1 

-1 

0 

1 - > 

0 

1 

-1 

0 

1 - » 

0 

1 

-1 

0 

0 

2 

-2 

3 

0 

0 

0 

3 
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0 

-2 

-4 

Vo 
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-3 

-4/ 

Vo 

0 

-2 

-4/ 

Vo 

0 

0 

3/ 

where  we  used  a  single  row  interchange  to  obtain  the  final  upper  triangular  form.  Owing 
to  the  row  interchange,  the  determinant  of  the  original  matrix  is  —1  times  the  product  of 
the  pivots: 

det  A  =  -1  •  (1  •  1  •  (-2)  •  3)  =  6. 

In  particular,  this  tells  us  that  A  is  nonsingular.  But,  of  course,  this  was  already  evident, 
since  we  successfully  reduced  the  matrix  to  upper  triangular  form  with  4  nonzero  pivots. 


There  is  a  variety  of  other  approaches  to  evaluating  determinants.  However,  except 
for  very  small  (2  x  2  or  3  x  3)  matrices  or  other  special  situations,  the  most  efficient 
algorithm  for  computing  the  determinant  of  a  matrix  is  to  apply  Gaussian  Elimination, 
with  pivoting  if  necessary,  and  then  invoke  the  relevant  formula  from  Theorem  1.52.  In 
particular,  the  determinantal  criterion  (1.84)  for  singular  matrices,  while  of  theoretical 
interest,  is  unnecessary  in  practice,  since  we  will  have  already  detected  whether  the  matrix 
is  singular  during  the  course  of  the  elimination  procedure  by  observing  that  it  has  fewer 
than  the  full  number  of  pivots. 

Let  us  finish  by  stating  a  few  of  the  basic  properties  of  determinants.  Proofs  are  outlined 
in  the  exercises. 
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Proposition  1.54.  The  determinant  of  the  product  of  two  square  matrices  of  the  same 
size  is  the  product  of  their  determinants: 

det(AB)  —  det  A  det  B.  (1.85) 


Therefore,  even  though  matrix  multiplication  is  not  commutative,  and  so  A B  ^  B  A  in 
general,  both  matrix  products  have  the  same  determinant: 

det  (AB)  =  det  A  det  B  =  det  B  det  A  =  det  ( B  A) , 

because  ordinary  (scalar)  multiplication  is  commutative.  In  particular,  setting  B  =  A-1 
and  using  axiom  (iv),  we  find  that  the  determinant  of  the  inverse  matrix  is  the  reciprocal 
of  the  matrix’s  determinant. 


Proposition  1.55.  If  A  is  a  nonsingular  matrix,  then 


det  A  1 


1 

det  A 


(1.86) 


Finally,  for  later  reference,  we  end  with  the  general  formula  for  the  determinant  of  an 
n  x  n  matrix  A  with  entries  a  -  •  : 

lj 


det  A  =  (sign  i r)  a 


tt(1),1  a7r(2),2 


a 


tv  (n),n 


7 r 


(1.87) 


The  sum  is  over  all  possible  permutations  i r  of  the  rows  of  A.  The  sign  of  the  permutation, 
written  sign7r,  equals  the  determinant  of  the  corresponding  permutation  matrix  P,  so 
sign7r  =  det  P  —  +1  if  the  permutation  is  composed  of  an  even  number  of  row  interchanges 
and  —1  if  composed  of  an  odd  number.  For  example,  the  six  terms  in  the  well-known 
formula 
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(1.88) 


for  a  3  x  3  determinant  correspond  to  the  six  possible  permutations  (1.31)  of  a  3-rowed 
matrix.  A  proof  that  the  formula  (1.87)  satisfies  the  defining  properties  of  the  determinant 
listed  in  Theorem  1.50  is  tedious,  but  not  hard.  The  reader  might  wish  to  try  out  the  3x3 
case  to  be  convinced  that  it  works. 

The  explicit  formula  (1.87)  proves  that  the  determinant  function  is  well-defined,  and 
formally  completes  the  proof  of  Theorem  1.50.  One  consequence  of  this  formula  is  that  the 
determinant  is  unaffected  by  the  transpose  operation. 


Proposition  1.56.  Transposing  a  matrix  does  not  change  its  determinant: 

det  At  =  det  A. 


(1.89) 


Remark.  Proposition  1.56  has  the  interesting  consequence  that  one  can  equally  well 
use  “elementary  column  operations”  to  compute  determinants.  We  will  not  develop  this 
approach  in  any  detail  here,  since  it  does  not  help  us  to  solve  linear  equations. 

However,  the  explicit  determinant  formula  (1.87)  is  not  used  in  practice.  Since  there  are 
n  \  different  permutations  of  the  n  rows,  the  determinantal  sum  (1.87)  contains  n  \  distinct 
terms,  which,  as  soon  as  n  is  of  moderate  size,  renders  it  completely  useless  for  practical 
computations.  For  instance,  the  determinant  of  a  10  x  10  matrix  contains  10!  =  3,628,800 
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terms,  while  a  100  x  100  determinant  would  require  summing  9.3326  x  10157  terms,  each  of 
which  is  a  product  of  100  matrix  entries!  The  most  efficient  way  to  compute  determinants 
is  still  our  mainstay  —  Gaussian  Elimination,  coupled  with  the  fact  that  the  determinant 
is  =b  the  product  of  the  pivots!  On  this  note,  we  conclude  our  brief  introduction. 


Exercises 


1.9.1.  Use  Gaussian  Elimination  to  find  the  determinant  of  the  following  matrices: 


(a) 
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-4  3 
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(g) 
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1.9.2.  Verify  the  determinant  product  formula  (1.85)  when 
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3  \ 
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A  = 
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B  = 
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-2 

u 

-2 
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1/ 

1.9.3.  (a)  Give  an  example  of  a  non-diagonal  2x2  matrix  for  which  A2  =  I .  (b)  In  general,  if 
A2  =  I ,  show  that  det  A  =  d=l.  (c)  If  A2  =  A,  what  can  you  say  about  detT? 

1.9.4.  True  or  false:  If  true,  explain  why.  If  false,  give  an  explicit  counterexample. 

(a)  If  detT  /  0  then  T_1  exists,  (b)  det  (2  A)  =  2  det  A  (c)  det  (A B)  =  det  A  +  det  B. 


(d)  detT  T  = 


(e)  det (AB-1)  = 


det  A 


det  A  v  7  v  7  det  B 

(g)  If  A  is  an  n  x  n  matrix  with  detT  =  0,  then  rankT  <  n. 

(h)  If  det  A  =  1  and  AB  =  O,  then  B  =  O. 


(f)  det[(A  +  B){A  -  B)]  =  det(A2  -  B2). 


1.9.5.  Prove  that  the  similar  matrices  B  =  S'-1  AS  have  the  same  determinant:  det  A  =  det  B. 

1.9.6.  Prove  that  if  A  is  a  n  x  n  matrix  and  c  is  a  scalar,  then  det(cT)  =  cn  det  A 

1.9.7.  Prove  that  the  determinant  of  a  lower  triangular  matrix  is  the  product  of  its  diagonal 
entries. 


1.9.8.  (a)  Show  that  if  A  has  size  n  x  n,  then  det  (—A)  =  (  —  l)n  det  A  (b)  Prove  that,  for 

n  odd,  any  n  x  n  skew-symmetric  matrix  A  =  —  AT  is  singular,  (c)  Find  a  nonsingular 
skew-symmetric  matrix. 

0  1.9.9.  Prove  directly  that  the  2x2  determinant  formula  (1.38)  satisfies  the  four  determinant 
axioms  listed  in  Theorem  1.50. 


0  1.9.10.  In  this  exercise,  we  prove  the  determinantal  product  formula  (1.85).  (a)  Prove  that 
if  E  is  any  elementary  matrix  (of  the  appropriate  size),  then  det (E B)  =  det  E  det  B. 

(b)  Use  induction  to  prove  that  if  A  =  E1  E2  •  •  •  EN  is  a  product  of  elementary  matrices, 
then  det  (AB)  =  detT  det  B.  Explain  why  this  proves  the  product  formula  whenever  A  is 
a  nonsingular  matrix,  (c)  Prove  that  if  Z  is  a  matrix  with  a  zero  row,  then  Z  B  also  has  a 
zero  row,  and  so  det  {ZB)  =  0  =  det  Z  det  B.  (d)  Use  Exercise  1.8.21  to  complete  the  proof 
of  the  product  formula. 

1.9.11.  Prove  (1.86). 

0  1.9.12.  Prove  (1.89).  Hint :  Use  Exercise  1.6.30  in  the  regular  case.  Then  extend  to  the 
nonsingular  case.  Finally,  explain  why  the  result  also  holds  for  singular  matrices. 
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1.9.13.  Write  out  the  formula  for  a  4  x  4  determinant.  It  should  contain  24  =  4!  terms. 


0  1.9.14.  Show  that  (1.87)  satisfies  all  four  determinant  axioms,  and  hence  is  the  correct  formula 
for  a  determinant. 

0  1.9.15.  Prove  that  axiom  ( iv )  in  Theorem  1.50  can  be  proved  as  a  consequence  of  the  first 
three  axioms  and  the  property  det  1=1. 

0  1.9.16.  Prove  that  one  cannot  produce  an  elementary  row  operation  of  type  #2  by  a 
combination  of  elementary  row  operations  of  type  #1. 


T  1.9.17.  Show  that  (a)  if  A  =  ^  is  regular,  then  its  pivots  are  a  and 


(b)  if  A  = 


fa  o  e 

c  d  f  |  is  regular,  then  its  pivots  are  a. 
\9  h  j 


ad  —  be 


a 


,  and 


det  A 

5 

a 

det  A 
ad  —  be 


(c)  Can  you  generalize  this  observation  to  regular  n  x  n  matrices? 

C  1.9.18.  In  this  exercise,  we  justify  the  use  of  “elementary  column  operations”  to  compute 

determinants.  Prove  that  (a)  adding  a  scalar  multiple  of  one  column  to  another  does  not 
change  the  determinant;  (b)  multiplying  a  column  by  a  scalar  multiplies  the  determinant 
by  the  same  scalar;  (c)  interchanging  two  columns  changes  the  sign  of  the  determinant. 

(d)  Explain  how  to  use  elementary  column  operations  to  reduce  a  matrix  to  lower 
triangular  form  and  thereby  compute  its  determinant. 

0  1.9.19.  Find  the  determinant  of  the  Vandermonde  matrices  listed  in  Exercise  1.3.24.  Can  you 
guess  the  general  n  x  n  formula? 

C  1.9.20.  Cramer’s  Rule,  (a)  Show  that  the  nonsingular  system  ax  +  by  =  p,  cx  +  dy  =  q  has 
the  solution  given  by  the  determinantal  ratios 


x  =  —  det 

A 


p  b 
q  d 


y  =  —  det 
y  A 


a  p 
c  q 


(b)  Use  Cramer’s  Rule  (1.90)  to  solve  the  systems 

ax  +  by  +  cz  =  p, 


(0 


where  A  =  det 
x  +  3y  =  13, 


(1.90) 


4x  +  2y  =  0. 

(  a 


(a) 


(c)  Prove  that  the  solution  to  dx  +  ey  +  / z  =  g,  with  A  =  det 

gx  +  hy  +  j  z  =  r, 


x  =  —  det 
A 


a  b 
c  d 

x  —  2y  =  4, 
3x  +  6y  =  —  2. 
c\ 


d  e  f 
V  9  h  j  J 


^  0  is 


(  p 
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c\ 
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(  a 
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c\ 
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(  a 

b 

P\ 
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/  > 

y  =  —  det 
y  A 

d 
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f  > 

z  =  —  det 
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\  r 
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j) 

V 9 

r 

j) 

V 9 
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r  J 

(L9!) 


x  +  4y  =  3,  3x-\-2y  —  z  =  l, 

(d)  Use  Cramer’s  Rule  (1.91)  to  solve  ( i )  4x  +  2y  +  z  =  2,  (ii)  x  —  3y  +  2z  =  2, 

-x-\-y  —  z  =  0,  2x  —  y -\- z  =  3. 

(e)  Can  you  see  the  pattern  that  will  generalize  to  n  equations  in  n  unknowns? 

Remark.  Although  elegant,  Cramer’s  rule  is  not  a  very  practical  solution  method. 

0  1.9.21.  (a)  Show  that  if  H  =  ^^isa  block  diagonal  matrix,  where  A  and  B  are  square 

matrices,  then  det  D  =  det  A  detE>.  (b)  Prove  that  the  same  holds  for  a  block  upper 

(A  C\ 

triangular  matrix  det  (  q  py  )  =  ^  (c)  Use  this  method  to  compute  the 


determinant  of  the  following  matrices: 
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Chapter  2 

Vector  Spaces  and  Bases 

Vector  spaces  and  their  ancillary  structures  provide  the  common  language  of  linear  algebra, 
and,  as  such,  are  an  essential  prerequisite  for  understanding  contemporary  applied  (and 
pure)  mathematics.  The  key  concepts  of  vector  space,  subspace,  linear  independence,  span, 
and  basis  will  appear,  not  only  in  linear  systems  of  algebraic  equations  and  the  geometry 
of  n-dimensional  Euclidean  space,  but  also  in  the  analysis  of  linear  differential  equations, 
linear  boundary  value  problems,  Fourier  analysis,  signal  processing,  numerical  methods, 
and  many,  many  other  fields.  Therefore,  in  order  to  master  modern  linear  algebra  and  its 
applications,  the  first  order  of  business  is  to  acquire  a  firm  understanding  of  fundamental 
vector  space  constructions. 

One  of  the  grand  themes  of  mathematics  is  the  recognition  that  many  seemingly  unre¬ 
lated  entities  are,  in  fact,  different  manifestations  of  the  same  underlying  abstract  structure. 
This  serves  to  unify  and  simplify  the  disparate  special  situations,  at  the  expense  of  intro¬ 
ducing  an  extra  level  of  abstraction.  Indeed,  the  history  of  mathematics,  as  well  as  your 
entire  mathematical  educational  career,  can  be  viewed  as  an  evolution  towards  ever  greater 
abstraction  resulting  in  ever  greater  power  for  solving  problems.  Here,  the  abstract  no¬ 
tion  of  a  vector  space  serves  to  unify  spaces  of  ordinary  vectors,  spaces  of  functions,  such 
as  polynomials,  exponentials,  and  trigonometric  functions,  as  well  as  spaces  of  matrices, 
spaces  of  linear  operators,  and  so  on,  all  in  a  common  conceptual  framework.  Moreover, 
proofs  that  might  appear  to  be  complicated  in  a  particular  context  often  turn  out  to  be 
relatively  transparent  when  recast  in  the  more  inclusive  vector  space  language.  The  price 
that  one  pays  for  the  increased  level  of  abstraction  is  that,  while  the  underlying  math¬ 
ematics  is  not  all  that  complicated,  novices  typically  take  a  long  time  to  assimilate  the 
underlying  concepts.  In  our  opinion,  the  best  way  to  approach  the  subject  is  to  think  in 
terms  of  concrete  examples.  First,  make  sure  you  understand  what  is  being  said  in  the  case 
of  ordinary  Euclidean  space.  Once  this  is  grasped,  the  next  important  case  to  consider  is 
an  elementary  function  space,  e.g.,  the  space  of  continuous  scalar  functions.  With  the  two 
most  important  cases  firmly  in  hand,  the  leap  to  the  general  abstract  formulation  should 
not  be  too  painful.  Patience  is  essential;  ultimately,  the  only  way  to  truly  understand  an 
abstract  concept  like  a  vector  space  is  by  working  with  it  in  real-life  applications!  And 
always  keep  in  mind  that  the  effort  expended  here  will  be  amply  rewarded  later  on. 

Following  an  introduction  to  vector  spaces  and  subspaces,  we  develop  the  fundamental 
notions  of  span  and  linear  independence,  first  in  the  context  of  ordinary  vectors,  but  then 
in  more  generality,  with  an  emphasis  on  function  spaces.  These  are  then  combined  into 
the  all-important  definition  of  a  basis  of  a  vector  space,  leading  to  a  linear  algebraic  char¬ 
acterization  of  its  dimension.  Here  is  where  the  distinction  between  finite-dimensional  and 
infinite-dimensional  vector  spaces  first  becomes  apparent,  although  the  full  ramifications 
of  this  dichotomy  will  take  time  to  unfold.  We  will  then  study  the  four  fundamental  sub¬ 
spaces  associated  with  a  matrix  —  its  image,  kernel,  coimage,  and  cokernel  —  and  explain 
how  they  help  us  understand  the  structure  and  the  solutions  of  linear  algebraic  systems. 
Of  particular  significance  is  the  linear  superposition  principle  that  enables  us  to  combine 
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solutions  to  linear  systems.  Superposition  is  the  hallmark  of  linearity,  and  will  apply  not 
only  to  linear  algebraic  equations,  but  also  to  linear  ordinary  differential  equations,  linear 
boundary  value  problems,  linear  partial  differential  equations,  linear  integral  equations, 
linear  control  systems,  etc.  The  final  section  in  this  chapter  develops  some  interesting  ap¬ 
plications  in  graph  theory  that  serve  to  illustrate  the  fundamental  matrix  subspaces;  these 
results  will  be  developed  further  in  our  study  of  electrical  circuits. 

2.1  Real  Vector  Spaces 

A  vector  space  is  the  abstract  reformulation  of  the  quintessential  properties  of  n-dimen- 
sionalt  Euclidean  space  Mn,  which  is  defined  as  the  set  of  all  real  (column)  vectors  with 
n  entries.  The  basic  laws  of  vector  addition  and  scalar  multiplication  in  Mn  serve  as  the 
template  for  the  following  general  definition. 

Definition  2.1.  A  vector  space  is  a  set  V  equipped  with  two  operations: 

(z)  Addition :  adding  any  pair  of  vectors  v,  w  E  V  produces  another  vector  v  +  w  E  V; 

(ii)  Scalar  Multiplication :  multiplying  a  vector  v  E  V  by  a  scalar  c  E  M  produces  a 

vector  cv  E  V. 

These  are  subject  to  the  following  axioms,  valid  for  all  u,  v,  w  E  V  and  all  scalars  c,  d  E  R: 

(a)  Commutativity  of  Addition :  v  +  w  =  w  +  v. 

(b)  Associativity  of  Addition :  u  +  (v  +  w)  =  (u  +  v)  +  w. 

(c)  Additive  Identity :  There  is  a  zero  element  0  E  V  satisfying  v  +  0  =  v  =  0  +  v. 

(d)  Additive  Inverse :  For  each  vGf  there  is  an  element  —  v  E  V  such  that 

v  +  (-v)  =  0  =  (-v)  +  v. 

(e)  Distributivity :  (c  +  d)v  =  (cv)  +  (dv),  and  c(v  + w)  =  (cv)  +  (cw). 

(f)  Associativity  of  Scalar  Multiplication:  c(dv)  =  (cd)v. 

(g)  Unit  for  Scalar  Multiplication :  the  scalar  1  £  R  satisfies  lv  =  v. 

Remark.  For  most  of  this  text,  we  will  deal  with  real  vector  spaces,  in  which  the  scalars 
are  ordinary  real  numbers,  as  indicated  in  the  definition.  Complex  vector  spaces,  where 
complex  scalars  are  allowed,  will  be  introduced  in  Section  3.6.  Vector  spaces  over  other 
fields  are  studied  in  abstract  algebra,  [38  . 

In  the  beginning,  we  will  refer  to  the  individual  elements  of  a  vector  space  as  “vectors” , 
even  though,  as  we  shall  see,  they  might  also  be  functions,  or  matrices,  or  even  more  general 
objects.  Unless  we  are  dealing  with  certain  specific  examples  such  as  a  space  of  functions 
or  matrices,  we  will  use  bold  face,  lower  case  Latin  letters  v,  w, . . .  to  denote  the  elements 
of  our  vector  space.  We  will  usually  use  a  bold  face  0  to  denote  the  unique1*1  zero  element 
of  our  vector  space,  while  ordinary  0  denotes  the  real  number  zero. 

The  following  identities  are  elementary  consequences  of  the  vector  space  axioms: 

(h)  Ov  =  0; 

(i)  (— 1) v  =  —  v; 

(j)  c  0  =  0; 

(k)  If  cv  =  0,  then  either  c  —  0  or  v  =  0. 


^  The  precise  definition  of  dimension  will  appear  later,  in  Theorem  2.29. 
■*■  See  Exercise  2.1.12. 
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V  V  +  w 


Vector  Addition  Scalar  Multiplication 

Figure  2.1.  Vector  Space  Operations  in  Mn. 


Let  us,  for  example,  prove  (ft).  Let  z  =  Ov.  Then,  by  the  distributive  property, 

z  +  z  =  Ov  +  Ov  =  (0  +  0)v  =  Ov  =  z. 

Adding  —  z  to  both  sides  of  this  equation,  and  making  use  of  axioms  (b),  (d),  and  then 
(c),  we  conclude  that 

z  =  z  +  0  =  z  +  (z  +  (— z))  =  (z  +  z)  +  (— z)  =  z  +  (— z)  =  0, 


which  completes  the  proof.  Verification  of  the  other  three  properties  is  left  as  an  exercise 
for  the  reader. 

Let  us  now  introduce  the  most  important  examples  of  (real)  vector  spaces. 

Example  2.2.  As  noted  above,  the  prototypical  example  of  a  real  vector  space  is  the 

Euclidean  space  Mn,  consisting  of  all  n-tuples  of  real  numbers  v  =  ( v2, . . . ,  vn  )  ,  which 
we  consistently  write  as  column  vectors.  Vector  addition  and  scalar  multiplication  are 
defined  in  the  usual  manner: 


v  +  w  = 


/  1>1  +  1»1  \ 

v2  +  w2 

\V„+W„J 


CV1  \ 


cv  = 


cv< 


\CVnJ 


(vi\ 


whenever  v  = 


v< 


( wi\ 

Wn 


w  = 


\VnJ 


\Wn / 


cGl. 


The  zero  vector  is  0  =  (0,...,0)T.  The  two  vector  space  operations  are  illustrated  in 
Figure  2.1.  The  fact  that  vectors  in  Mn  satisfy  all  of  the  vector  space  axioms  is  an  immediate 
consequence  of  the  laws  of  vector  addition  and  scalar  multiplication. 


Example  2.3.  Let  MrnXn  denote  the  space  of  all  real  matrices  of  size  m  x  n.  Then 

MrnXn  forms  a  vector  space  under  the  laws  of  matrix  addition  and  scalar  multiplication. 
The  zero  element  is  the  zero  matrix  O.  (We  are  ignoring  matrix  multiplication,  which  is  not 
a  vector  space  operation.)  Again,  the  vector  space  axioms  are  immediate  consequences  of 
the  basic  laws  of  matrix  arithmetic.  The  preceding  example  of  the  vector  space  Mn  =  AlnXl 
is  a  particular  case  in  which  the  matrices  have  only  one  column. 


Example  2.4.  Consider  the  space 

T>(n)  =  |  p(x)  =  an  xn  +  an_x  xn-1  +  •  •  •  +  ax  x  +  a0  } 


(2.1) 
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Scalar  Multiplication: 
2  f(x)  =  |  cos  14x 


Addition: 

f(x)  +  g{x)  —  |  cos  lAx  +  x2 


Figure  2.2.  Vector  Space  Operations  in  Function  Space. 


consisting  of  all  real  polynomials  of  degree  <  n.  Addition  of  polynomials  is  defined  in  the 
usual  manner;  for  example, 

(, x 2  —  3x)  +  ( 2x 2  —  5x  +  4)  =  3x2  —  8x  +  4. 

Note  that  the  sum  p{x)  +  q(pc)  of  two  polynomials  of  degree  <  n  also  has  degree  <  n.  The 
zero  element  of  V ^  is  the  zero  polynomial.  We  can  multiply  polynomials  by  scalars  —  real 
constants  —  in  the  usual  fashion;  for  example  if  p{x)  =  x2  —  2  x,  then  3 p(x)  =  3 x2  —  6 x. 
The  proof  that  V ^  satisfies  the  vector  space  axioms  is  an  easy  consequence  of  the  basic 
laws  of  polynomial  algebra. 

Warning.  It  is  not  true  that  the  sum  of  two  polynomials  of  degree  n  also  has  degree  n. 
For  example  ( x 2  +  1)  +  (—  x2  +  x)  =  x  1  has  degree  1  even  though  the  two  summands 
have  degree  2.  This  means  that  the  set  of  polynomials  of  degree  —  n  is  not  a  vector  space. 

Warning.  You  might  be  tempted  to  identify  a  scalar  with  a  constant  polynomial,  but  one 
should  really  regard  these  as  two  completely  different  objects  —  one  is  a  number ,  while  the 
other  is  a  constant  function .  To  add  to  the  confusion,  one  typically  uses  the  same  notation 
for  these  two  objects;  for  instance,  0  could  mean  either  the  real  number  0  or  the  constant 
function  taking  the  value  0  everywhere,  which  is  the  zero  element,  0,  of  this  vector  space. 
The  reader  needs  to  exercise  due  care  when  interpreting  each  occurrence. 

For  much  of  analysis,  including  differential  equations,  Fourier  theory,  numerical  meth¬ 
ods,  etc.,  the  most  important  vector  spaces  consist  of  functions  that  have  certain  prescribed 
properties.  The  simplest  such  example  is  the  following. 
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Let  /  C  R  be  an  interval^.  Consider  the  function  space  F  —  T ’(/)  whose 

elements  are  all  real- valued  functions  /(x)  defined  for  all  x  E  /.  The  claim  is  that  the 
function  space  T  has  the  structure  of  a  vector  space.  Addition  of  functions  in  T  is  defined 
in  the  usual  manner:  (/ -\-g)(x)  =  f(x)  +  g(T)  for  all  x  E  /.  Multiplication  by  scalars  cG  I 
is  the  same  as  multiplication  by  constants,  (cf)(x)  =  cf(x).  The  zero  element  is  the  zero 
function  —  the  constant  function  that  is  identically  0  for  all  x  E  /.  The  proof  of  the  vector 
space  axioms  is  straightforward.  Observe  that  we  are  ignoring  all  additional  operations 
that  affect  functions  such  as  multiplication,  division,  inversion,  composition,  etc.;  these  are 
irrelevant  as  far  as  the  vector  space  structure  of  T  goes. 

Example  2.6.  The  preceding  examples  are  all,  in  fact,  special  cases  of  an  even  more 

general  construction.  A  clue  is  to  note  that  the  last  example  of  a  function  space  does  not 
make  any  use  of  the  fact  that  the  domain  of  the  functions  is  a  real  interval.  Indeed,  the 
same  construction  produces  a  function  space  T[T)  corresponding  to  any  subset  /  Cl. 

Even  more  generally,  let  S  be  any  set.  Let  T  —  T[S)  denote  the  space  of  all  real- valued 
functions  /:  S  — ^  R.  Then  we  claim  that  V  is  a  vector  space  under  the  operations  of 
function  addition  and  scalar  multiplication.  More  precisely,  given  functions  /  and  g ,  we 
define  their  sum  to  be  the  function  h  =  f  +  g  such  that  h(x)  =  f{x)  +  g{x)  for  all  x  E  S. 
Similarly,  given  a  function  /  and  a  real  scalar  c  E  R,  we  define  the  scalar  multiple  g  —  cf 
to  be  the  function  such  that  g(x)  —  cf(x)  for  all  x  E  S.  The  proof  of  the  vector  space 
axioms  is  straightforward,  and  the  reader  should  be  able  to  fill  in  the  necessary  details. 

In  particular,  if  S  C  R  is  an  interval,  then  ^(S)  coincides  with  the  space  of  scalar 
functions  described  in  the  preceding  example.  If  S  C  Mn  is  a  subset  of  Euclidean  space, 
then  the  elements  of  J-'(S)  are  real-valued  functions  /(aq, . . .  :xn)  depending  upon  the  n 
variables  corresponding  to  the  coordinates  of  points  x  =  (x1: . . . ,  xn)  E  S  in  the  domain. 
In  this  fashion,  the  set  of  real- valued  functions  defined  on  any  domain  in  Mn  forms  a  vector 
space. 

Another  useful  example  is  to  let  S  =  {x1: . . . ,  xn}  C  R  be  a  finite  set  of  real  numbers.  A 
real- valued  function  /:  S  — ^  R  is  defined  by  its  values  /(a^),  f(x2), . . . ,  f(xn)  at  the  specified 
points.  In  applications,  these  objects  serve  to  indicate  the  sample  values  of  a  scalar  function 
f{x)  E  J-'(M)  taken  at  the  sample  points  x1? . . .  ,xn.  For  example,  if  f{x)  —  x2  and  the 
sample  points  are  xx  —  0,  x2  =  1,  x3  —  2,  x4  —  3,  then  the  corresponding  sample 
values  are  f{xx)  =  0,  f(x2)  =  1,  f(x 3)  =  4,  f(x4)  =  9.  When  measuring  a  physical 
quantity  —  velocity,  temperature,  pressure,  etc.  —  one  typically  records  only  a  finite  set 
of  sample  values.  The  intermediate,  non-recorded  values  between  the  sample  points  are 
then  reconstructed  through  some  form  of  interpolation,  a  topic  that  we  shall  visit  in  depth 
in  Chapters  4  and  5. 

Interestingly,  the  sample  values  f(x{)  can  be  identified  with  the  entries  fi  of  a  vector 

E  Mn, 


f  =  (  fl,  /2>  •  •  •  >  fn  )  =  (  f(X l)>  fM,  •  •  •  >  f(Xn)  ) 


Example  2.5. 


t  An  interval  is  a  subset  I  CR  that  contains  all  the  real  numbers  between  a,  b  E  M,  where  a  <  b, 
and  can  be 

•  closed ,  meaning  that  it  includes  its  endpoints:  I  =  [a,  b]  =  {  x  \  a  <  x  <  b  }; 

•  open ,  which  does  not  include  either  endpoint:  I  =  (a,  b)  =  {x  \  a  <  x  <  b};  or 

•  half-open ,  which  includes  one  but  not  the  other  endpoint,  so  I  =  [a,  b)  =  {  x  \  a  <  x  <  b  } 

or  /  =  (a,  b]  =  {x  \  a  <  x  <  b}. 

An  open  endpoint  is  allowed  to  be  infinite;  in  particular,  (— oo,  oo)  =  R  is  another  way  of  writing 
the  entire  real  line. 
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Figure  2.3.  Sampled  Function. 


known  as  the  sample  vector.  Every  sampled  function  f:  S  —¥  R  corresponds  to  a  unique 
vector  f  E  Rn  and  vice  versa.  (But  keep  in  mind  that  different  scalar  functions  f(x)  £  .F(R) 
may  have  the  same  sample  values.)  Addition  of  sample  functions  corresponds  to  addition 
of  their  sample  vectors,  as  does  scalar  multiplication.  Thus,  the  vector  space  of  sample 
functions  ^(S)  =  Jr{x1: . . . ,  xn}  is  the  same  as  the  vector  space  Rn!  The  identification 
of  sampled  functions  as  vectors  is  of  fundamental  importance  in  modern  signal  processing 
and  data  analysis,  as  we  will  see  below. 


Example  2.7.  The  above  construction  admits  yet  a  further  generalization.  We  continue 

to  let  S  be  an  arbitrary  set.  Let  V  be  a  vector  space.  The  claim  is  that  the  space  V ) 

consisting  of  all  V-valued  functions  f :  S  V  is  a  vector  space.  In  other  words,  we  replace 
the  particular  vector  space  R  in  the  preceding  example  by  a  general  vector  space  V ,  and 
the  same  conclusion  holds.  The  operations  of  function  addition  and  scalar  multiplication 
are  defined  in  the  evident  manner:  (f  +  g)(ar)  =  f(x)  +  g(  x)  and  (cf)(x)  =  cf(x)  for  x  £  S, 
where  we  are  using  the  vector  addition  and  scalar  multiplication  operations  on  V  to  induce 
corresponding  operations  on  V-valued  functions.  The  proof  that  tF(S,V)  satisfies  all  of  the 
vector  space  axioms  proceeds  as  before. 

The  most  important  example  of  such  a  function  space  arises  when  S  C  Rn  is  a  do¬ 
main  in  Euclidean  space  and  V  =  Rm  is  itself  a  Euclidean  space.  In  this  case,  the 
elements  of  ^(A,  Rm)  consist  of  vector-valued  functions  f:S  Rm,  so  that  f(x)  = 
(  / L(x1, . . . ,  xn), . . . ,  •  •  • ,  xn )  )T  is  a  column  vector  consisting  of  m  functions  of  n 

variables,  all  defined  on  a  common  domain  S.  The  general  construction  implies  that 
addition  and  scalar  multiplication  of  vector-valued  functions  is  done  componentwise;  for 
example 
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f  cosx 

y  X 


f  2  x2  —  cosx 
y  2ex  —  x  —  8 


Of  particular  importance  are  the  vector  fields  arising  in  physics,  including  gravitational 
force  fields,  electromagnetic  fields,  fluid  velocity  fields,  and  many  others. 


Exercises 

2.1.1.  Show  that  the  set  of  complex  numbers  x  +  iy  forms  a  real  vector  space  under  the 

operations  of  addition  (x  +  iy)  +  (u  +  iv)  =  (x  +  u)  +  i  (y  +  v )  and  scalar  multiplication 
c(x  +  iy)  =  cx  +  icy.  (But  complex  multiplication  is  not  a  real  vector  space  operation.) 
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Q 

2.1.2.  Show  that  the  positive  quadrant  Q  =  {  (x,y)  \  x,y  >  0}  C  R  forms  a  vector  space 
if  we  define  addition  by  (x1,y1)  +  (x2,y2)  =  (aq  x2,  ^  ?/2)  an<^  scalar  multiplication  by 
c  (x,  y)  =  (xc ,  yc). 


0  2.1.3.  Let  S  be  any  set.  Carefully  justify  the  validity  of  all  the  vector  space  axioms  for  the 
space  ^F(S)  consisting  of  all  real- valued  functions  /:  S  — >  R. 

2.1.4.  Let  S  =  {0, 1,2,3}.  (a)  Find  the  sample  vectors  corresponding  to  the  functions  1, 
cos  7 tx,  cos27tx,  cos37tx.  (b)  Is  a  function  uniquely  determined  by  its  sample  values? 

2.1.5.  Find  two  different  functions  f(x)  and  g(pc)  that  have  the  same  sample  vectors  f ,  g  at  the 
sample  points  x1  =  0,  x2  =  1,  x?j  =  —  1. 

2.1.6.  (a)  Let  x1  =0,  x2  =  1.  Find  the  unique  linear  function  f(x)  =  ax  +  b  that  has  the 

sample  vector  f  =  (3,  —1  )T .  (b)  Let  x1  =0,  x2  =  1,  x3  =  —1.  Find  the  unique  quadratic 

2  y 

function  /(#)  =  ax  +  bx  +  c  with  sample  vector  f  =  ( 1,  —2,  0 )  . 


0  0  o 

2.1.7.  Let  .F(R  ,  R  )  denote  the  vector  space  consisting  of  all  functions  f:R  — x 

O  O 

(a)  Which  of  the  following  functions  f  (x,y)  are  elements?  (i)  x  +  y  ,  (ii) 

x  \ 


x -y 

ocy 
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cos  y 
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,  (iv) 


1 

3 


(v) 


x  y 
-y  x 


,  (m) 


.  (b)  Sum  all  of  the  elements 


y 

x  +  y  J 

of  Jr(Rz,Rz)  you  identified  in  part  (a).  Then  multiply  your  sum  by  the  scalar  —5. 
(c)  Carefully  describe  the  zero  element  of  the  vector  space  .F(R  ,  R  ). 


0  2.1.8.  A  planar  vector  field  is  a  function  that  assigns  a  vector  v(x,y)  =  to  each 


point 


x 

y 


G  R  .  Explain  why  the  set  of  all  planar  vector  fields  forms  a  vector  space. 


C  2.1.9.  Let  h,k  >  0  be  fixed.  Let  S  =  {  (ih,  j  k)  |  1  <  i  <  m,  1  <  j  <  n  }  be 
points  in  a  rectangular  planar  grid.  Show  that  the  function  space  ^F(S)  can 
be  identified  with  the  vector  space  ofmxn  matrices  MrnXn- 


2.1.10.  The  space  R°°  is  defined  as  the  set  of  all  infinite  real  sequences  a  =  (al5  a2,  a3, . . .  ), 
where  ai  G  R.  Define  addition  and  scalar  multiplication  in  such  a  way  as  to  make  R°°  into 
a  vector  space.  Explain  why  all  the  vector  space  axioms  are  valid. 

2.1.11.  Prove  the  basic  vector  space  properties  (i),  (j),  (k)  following  Definition  2.1. 

0  2.1.12.  Prove  that  a  vector  space  has  only  one  zero  element  0. 

0  2.1.13.  Suppose  that  V  and  W  are  vector  spaces.  The  Cartesian  product  space ,  denoted  by 

V  x  W,  is  defined  as  the  set  of  all  ordered  pairs  (v,w),  where  v  G  V,  w  G  W,  with  vector 
addition  (v,  w)  +  (v,  w)  =  (v  +  v,  w  +  w)  and  scalar  multiplication  c(v,  w)  =  (cv,  cw). 
(a)  Prove  that  V  x  W  is  a  vector  space,  (b)  Explain  why  R  x  R  is  the  same  as  R  . 

(c)  More  generally,  explain  why  Rm  x  Rn  is  the  same  as  Mm+n. 

2.1.14.  Use  Exercise  2.1.13  to  show  that  the  space  of  pairs  (/(x),a),  where  /  is  a  continuous 
scalar  function  and  a  is  a  real  number,  is  a  vector  space.  What  is  the  zero  element?  Be 
precise!  Write  out  the  laws  of  vector  addition  and  scalar  multiplication. 


2.2  Subspaces 

In  the  preceding  section,  we  were  introduced  to  the  most  basic  vector  spaces  that  arise  in 
this  text.  Almost  all  of  the  vector  spaces  used  in  applications  appear  as  subsets  of  these 
prototypical  examples. 
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Definition  2.8.  A  subspace  of  a  vector  space  V  is  a  subset  W  C  V  that  is  a  vector  space 
in  its  own  right  —  under  the  same  operations  of  vector  addition  and  scalar  multiplication 
and  the  same  zero  element. 

In  particular,  a  subspace  W  must  contain  the  zero  element  of  V.  Proving  that  a  given 
subset  of  a  vector  space  is  a  subspace  is  particularly  easy:  we  need  only  check  its  closure 
under  addition  and  scalar  multiplication. 

Proposition  2.9.  A  nonempty  subset  W  C  V  of  a  vector  space  is  a  subspace  if  and  only  if 

(a)  for  every  v,  w  G  W,  the  sum  v  +  w  G  W,  and 

(b)  for  every  v  G  W  and  every  cGl,  the  scalar  product  cv  G  W. 

Proof :  The  proof  is  immediate.  For  example,  let  us  check  commutativity.  The  subspace 
elements  v,  w  G  W  can  be  regarded  as  elements  of  V ,  in  which  case  v  + w  =  w  + v  because 
V  is  a  vector  space.  But  the  closure  condition  implies  that  the  common  sum  also  belongs 
to  W,  and  so  the  commutativity  axiom  also  holds  for  elements  of  W .  Establishing  the 
validity  of  the  other  axioms  is  equally  easy.  Q.E.D. 

It  is  sometimes  convenient  to  combine  the  two  closure  conditions.  Thus,  to  prove  that 
W  C  F  is  a  subspace,  it  suffices  to  check  that  c  v  V  d  w  G  W  for  all  v,  w  G  W  and  c,  d  G  M.. 

Example  2.10.  Let  us  list  some  examples  of  subspaces  of  the  three-dimensional  Eu¬ 
clidean  space  M3. 

(a)  The  trivial  subspace  W  =  {0}.  Demonstrating  closure  is  easy:  since  there  is  only 
one  element  0  in  W,  we  just  need  to  check  that  0  +  0  =  0  G  W  and  cO  =  0  G  W  for  every 
scalar  c. 

(b)  The  entire  space  W  =  M3.  Here  closure  is  immediate  because  M3  is  a  vector  space 
in  its  own  right. 

(c)  The  set  of  all  vectors  of  the  form  ( x,  y,  0  )T,  i.e.,  the  xy  coordinate  plane.  To  prove 

closure,  we  check  that  all  sums  ( x,  y,  0  )  +  ( x,  ?/,  0  )  =  ( x  +  x,  y  +  y,  0  )  and  scalar 

T  T 

multiples  c  ( x,  y,  0 )  =  ( cx,  cy,  0  )  of  vectors  in  the  xy- plane  remain  in  the  plane. 

T 

(d)  The  set  of  solutions  (x,y,z)  to  the  homogeneous  linear  equation 

3x  +  2y  —  z  —  0.  (2-2) 

T  T 

Indeed,  if  x  =  ( x,  y,  z )  is  a  solution,  then  so  is  every  scalar  multiple  cx  =  (  cx,  cy ,  cz) 
since 

3  (cx)  +  2  (cy)  —  (cz)  =  c(3x  +  2y  —  z)  =  0. 

Moreover,  if  x  =  (  x,  y,  ? )  is  a  second  solution,  the  sum  x  +  x=  (x  +  x,  y  +  y,  z  P^z) 
is  also  a  solution,  since 

3(x  +  x)  +  2 (y  +  y)  —  (z  +  z)  =  (3x  -\-2y  -  z)  +  (3x  +  2y  -z)  =  0. 

The  solution  space  is,  in  fact,  the  two-dimensional  plane  passing  through  the  origin  with 
normal  vector  (  3,  2,  —  1  )T. 

T 

(e)  The  set  of  all  vectors  lying  in  the  plane  spanned  by  the  vectors  w1  —  (2,  —3,0) 
and  v2  =  (1,0,3)T.  In  other  words,  we  consider  all  vectors  of  the  form 


v  =  av1  +  6v2  =  a 


2a  Pb 
—  3  a 
3  b 
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where  a,  b  E  R  are  arbitrary  scalars.  If  v  =  av1  -f  6v2  and  w  =  av1  +  6  v2  are  any  two 
vectors  in  the  span,  then  so  is 

cv  +  dw  =  c(av1  +  6v2)  +  rf(av1  +  6  v2)  =  (ac  +  ad)v1  +  (be  +  bd)w2  =av1  +  b  v2, 

where  a  =  ac  +  ad,  b  =  bc  +  bd.  This  demonstrates  that  the  span  is  a  subspace  of  M3.  The 
reader  might  already  have  noticed  that  this  subspace  is  the  same  plane  defined  by  (2.2). 


The  following  subsets  of  M3  are  not  subspaces. 

T 

(a)  The  set  P  of  all  vectors  of  the  form  (x,y,  1)  ,  i.e.,  the  plane  parallel  to  the  xy 

coordinate  plane  passing  through  (0,0,1)  .  Indeed,  (0,0,0)  0  P,  which  is  the  most  basic 

requirement  for  a  subspace.  In  fact,  neither  of  the  closure  axioms  hold  for  this  subset. 

(b)  The  nonnegative  orthant  (9+  =  {x  >  0,  y  >  0,  z  >  0}.  Although  0  E  (9+,  and 
the  sum  of  two  vectors  in  (9+  also  belongs  to  (9+,  multiplying  by  negative  scalars  takes  us 
outside  the  orthant,  violating  closure  under  scalar  multiplication. 

(c)  The  unit  sphere  S1  =  {  x2  +  y2  +  z2  =  1  }.  Again,  0  ^  S1.  More  generally,  curved 
surfaces,  such  as  the  paraboloid  P  —  {z  =  x2  +  y2  are  not  subspaces.  Although  0  E  P, 
most  scalar  multiples  of  elements  of  P  do  not  belong  to  P.  For  example,  ( 1, 1,  2)  E  P, 

T 

In  fact,  there  are  only  four  fundamentally  different  types  of  subspaces  W  C  M3  of 
three-dimensional  Euclidean  space: 

(i)  the  entire  three-dimensional  space  W  =  M3, 

(ii)  a  plane  passing  through  the  origin, 

(in)  a  line  passing  through  the  origin, 

( iv )  a  point  the  trivial  subspace  W  =  {0}. 

We  can  establish  this  observation  by  the  following  argument.  If  W  =  {0}  contains  only 
the  zero  vector,  then  we  are  in  case  (iv).  Otherwise,  W  C  M3  contains  a  nonzero  vector 
0^  v1  E  W.  But  since  W  must  contain  all  scalar  multiples  cw1  of  this  element,  it  includes 
the  entire  line  in  the  direction  of  v1.  If  W  contains  another  vector  v2  that  does  not  he 
in  the  line  through  v1?  then  it  must  contain  the  entire  plane  {cv:  +  dv2}  spanned  by 
v1,v2.  Finally,  if  there  is  a  third  vector  v3  not  contained  in  this  plane,  then  we  claim 
that  W  =  M3.  This  final  fact  will  be  an  immediate  consequence  of  general  results  in  this 
chapter,  although  the  interested  reader  might  try  to  prove  it  directly  before  proceeding. 

Let  I  C  R  be  an  interval,  and  let  P(I)  be  the  space  of  real- valued 

functions  /:  /  — ^  R.  Let  us  look  at  some  of  the  most  important  examples  of  subspaces  of 
P(/).  In  each  case,  we  need  only  verify  the  closure  conditions  to  verify  that  the  given  subset 
is  indeed  a  subspace.  In  particular,  the  zero  function  belongs  to  each  of  the  subspaces. 

(a)  The  space  P ^  of  polynomials  of  degree  <  n,  which  we  already  encountered. 

(b)  The  space  p(°°)  =  Un>o  consisting  of  all  polynomials.  Closure  means  that 
the  sum  of  any  two  polynomials  is  a  polynomial,  as  is  any  scalar  (constant)  multiple  of  a 
polynomial. 

(c)  The  space  C°(/)  of  all  continuous  functions.  Closure  of  this  subspace  relies  on 
knowing  that  if  f(x)  and  g(x)  are  continuous,  then  both  f(x)  +  g(x)  and  cf(x):  for  any 
cEl,  are  also  continuous  —  two  basic  results  from  calculus,  [2,  78]. 


Example  2.12. 


but  2  (1,1,2) 


(2,2,4 )  0P. 


Example  2.11. 
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(d)  More  restrictively,  we  can  consider  the  subspace  Cn(/)  consisting  of  all  functions 
f(x)  that  have  n  continuous  derivatives  f'(x),  f"(x ), . . . ,  f(n\x)  on’*’  /.  Again,  we  need  to 
know  that  if  f(x)  and  g(x)  have  n  continuous  derivatives,  then  so  does  cf(x)  +  dg(x)  for 
all  c,  d  E  R. 

(e)  The  space  C°°(/)  =  Hn>o  Cn(/)  of  inhnitely  differentiable  or  smooth  functions  is 
also  a  subspace.  This  can  be  proved  directly,  or  it  follows  from  the  general  fact  that  the 
intersection  of  subspaces  is  a  subspace,  cf.  Exercise  2.2.23. 

(f)  The  space  A(I)  of  analytic  functions  on  the  interval  /.  Recall  that  a  function  f(x) 
is  called  analytic  at  a  point  a  if  it  is  smooth,  and,  moreover,  its  Taylor  series 

.c  (n)  /  \ 

/(a)  +  /'(«)  (x  -  a)  +  \  /"(a)  (x  -  a)2  +  •  •  •  =  V  - A  (x  -  a)n  (2.3) 

z '  n\ 

n  —  0 


converges  to  f(x)  for  all  x  sufficiently  close  to  a.  (The  series  is  not  required  to  converge  on 
the  entire  interval  I.)  Not  every  smooth  function  is  analytic,  and  so  A(I)  C  C°°(/).  An 
explicit  example  of  a  smooth  but  non- analytic  function  can  be  found  in  Exercise  2.2.30. 

(g)  The  set  of  all  mean  zero  functions.  The  mean  or  average  of  an  integrable  function 
defined  on  a  closed  interval  I  —  [a A]  is  the  real  number 


/ 


1 


a 


f(x)  dx. 


In  particular,  /  has  mean  zero  if  and  only  if  /  f(x)  dx  —  0.  Since  f  +  g  =  /  +  g,  and 

_  _  J  a 

cf  =  cf,  sums  and  scalar  multiples  of  mean  zero  functions  also  have  mean  zero,  proving 
closure. 

(h)  Fix  x0  e  I.  The  set  of  all  functions  f(x)  that  vanish  at  the  point,  f(x0)  =  0,  is 
a  subspace.  Indeed,  if  f(x0)  =  0  and  g(x0)  =  0,  then,  clearly  (cf  +  dg)(x0)  =  cf(x0 )  + 
dg(x0)  =  0  for  all  c,  d  E  R,  proving  closure.  This  example  can  evidently  be  generalized  to 
functions  that  vanish  at  several  points,  or  even  on  an  entire  subset  S  C  /. 

(i)  The  set  of  all  solutions  u  =  f(x)  to  the  homogeneous  linear  differential  equation 


u"  +  2u'  —  3u  —  0. 


Indeed,  if  f(x)  and  g(x)  are  solutions,  then  so  is  f(x)  +  g(x)  and  cf(x)  for  all  cGl.  Note 
that  we  do  not  need  to  actually  solve  the  equation  to  verify  these  claims!  They  follow 
directly  from  linearity: 

(/  +  d)"  +  2  (/  +  g)'  —  3  (/  +  g)  =  (f"  +  2/'  —  3/)  +  (g"  +  2  g'  —  3  g)  =  0, 

(cf)"  +  2 (cf)'  -  3 (cf)  =  c(f"  +  2f'-3f)  =  0. 


Remark.  In  the  last  three  examples,  0  is  essential  for  the  indicated  set  of  functions  to  be 
a  subspace.  The  set  of  functions  such  that  f(x0)  =  1,  say,  is  not  a  subspace.  The  set  of 
functions  with  a  given  nonzero  mean,  say  /  =  3,  is  also  not  a  subspace.  Nor  is  the  set  of 
solutions  to  an  inhomogeneous  ordinary  differential  equation,  say  u"  +  2u'  —  3u  =  x  —  3. 
None  of  these  subsets  contain  the  zero  function,  nor  do  they  satisfy  the  closure  conditions. 


We  use  one-sided  derivatives  at  any  endpoint  belonging  to  the  interval. 
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Exercises 

T 

2.2.1.  (a)  Prove  that  the  set  of  all  vectors  (x,y,z)  such  that  x  —  g  +  4z  =  0  forms  a  subspace 

o 

of  R  .  (b)  Explain  why  the  set  of  all  vectors  that  satisfy  x  —  y  -\-  Az  =  1  does  not  form  a 

subspace. 

2.2.2.  Which  of  the  following  are  subspaces  of  R  ?  Justify  your  answers!  (a)  The  set  of  all 

T  T 

vectors  (x,y,z)  satisfying  x  A  y  A  z  -\~ 1  =  0.  (b)  The  set  of  vectors  of  the  form  (£,—£,  0 ) 

for  £  £  R.  (c)  The  set  of  vectors  of  the  form  (r  -  s,r  +  2s,  -  5)  for  r,  s  £  R.  (d)  The 
set  of  vectors  whose  first  component  equals  0.  (e)  The  set  of  vectors  whose  last  component 

equals  1.  (f)  The  set  of  all  vectors  (x,y,z)  with  x  >  y  >  z.  (g)  The  set  of  all  solutions 
to  the  equation  z  =  x  —  y.  (h)  The  set  of  all  solutions  to  z  =  xy.  (i)  The  set  of  all  solutions 

r\  r\  r\  _ 

to  x  +y  +  z  =  0.  (j)  The  set  of  all  solutions  to  the  system  xy  =  y z  =  x z. 

o 

2.2.3.  Graph  the  following  subsets  of  R  and  use  this  to  explain  which  are  subspaces: 

(a)  The  line  ( £,  —  £,  3t  )T  for  t  £  R.  (b)  The  helix  ( cos  £,  sin  £,  t  )T .  (c)  The  surface 

x  —  2y  +  3z  =  0.  (d)  The  unit  ball  x2  Ay2  A  z2  <  1.  (e)  The  cylinder  (g  +  2)2  +  (z  —  l)2  =  5. 

(f)  The  intersection  of  the  cylinders  (x  —  l)2  A  y2  =  1  and  0  +  l)2  +y2  =  1. 

2.2.4.  Show  that  if  W  C  R3  is  a  subspace  containing  the  vectors  ( 1,  2,  —1  )T,  (  2,  0, 1  )T, 

( 0,  —1,  3  )T,  then  W  =  R3. 

2.2.5.  True  or  false:  An  interval  is  a  vector  space. 

2.2.6.  (a)  Can  you  construct  an  example  of  a  subset  S  C  R2  with  the  property  that  cv  £  S 
for  all  c  £  R,  v  £  5,  and  yet  S  is  not  a  subspace?  (b)  What  about  an  example  in  which 
v  +  w  £  S  for  every  v,  w  £  S ,  and  yet  S  is  not  a  subspace? 

2.2.7.  Determine  which  of  the  following  sets  of  vectors  x  =  ( aq ,  x2, . . . ,  )T  are  subspaces  of 

Rn:  (a)  all  equal  entries  x1  =  •  •  •  =  xn;  (b)  all  positive  entries:  xi  >  0;  (c)  first  and  last 
entries  equal  to  zero:  x±  =  xn  =  0;  (d)  entries  add  up  to  zero:  +  •  •  •  +  xn  =  0;  (e)  first 

and  last  entries  differ  by  one:  00  00  ry^  l  • 

2.2.8.  Prove  that  the  set  of  all  solutions  x  of  the  linear  system  Ax  =  b  forms  a  subspace  if  and 
only  if  the  system  is  homogeneous. 

2.2.9.  A  square  matrix  is  called  strictly  lower  triangular  if  all  entries  on  or  above  the  main 
diagonal  are  0.  Prove  that  the  space  of  strictly  lower  triangular  matrices  is  a  subspace  of 
the  vector  space  of  all  n  x  n  matrices. 

2.2.10.  Which  of  the  following  are  subspaces  of  the  vector  space  of  n  x  n  matrices  AJnXn? 

The  set  of  all  (a)  regular  matrices;  (b)  nonsingular  matrices;  (c)  singular  matrices; 

(d)  lower  triangular  matrices;  (e)  lower  unitriangular  matrices;  (f)  diagonal  matrices; 

(g)  symmetric  matrices;  (h)  skew-symmetric  matrices. 

0  2.2.11.  The  trace  of  an  n  x  n  matrix  A  £  MnXn  is  defined  to  be  the  sum  of  its  diagonal 

entries:  tr  A  =  a 11  +  a22  +  •  •  •  +  ann-  Prove  that  the  set  of  trace  zero  matrices,  tr  A  =  0,  is 
a  subspace  of  MnXn. 

2.2.12.  (a)  Is  the  set  of  n  x  n  matrices  with  det  A  =  1  a  subspace  of  AJnXn? 

(b)  What  about  the  matrices  with  det  A  =  0? 

2.2.13.  Let  V  =  C°(R)  be  the  vector  space  consisting  of  all  continuous  functions  /:  R  £  R. 
Explain  why  the  set  of  all  functions  such  that  /( 1)  =  Oisa  subspace,  but  the  set  of 
functions  such  that  /( 0)  =  1  is  not.  For  which  values  of  a,  b  does  the  set  of  functions  such 
that  /(a)  =  b  form  a  subspace? 
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2.2.14.  Which  of  the  following  are  vector  spaces?  Justify  your  answer!  (a)  The  set  of  all  row 
vectors  of  the  form  ( a,  3  a ).  (b)  The  set  of  all  vectors  of  the  form  ( a,  a  +  1 ) .  (c)  The  set 
of  all  continuous  functions  for  which  /(— 1)  =  0.  (d)  The  set  of  all  periodic  functions  of 
period  1,  i.e.,  f(pc  +  1)  =  /(x).  (e)  The  set  of  all  non-negative  functions:  f(pc)  >  0. 

(f)  The  set  of  all  even  polynomials:  p(x)  =  p(—x).  (g)  The  set  of  all  polynomials  p(pc)  that 

r\  r\ 

have  x  —  1  as  a  factor,  (h)  The  set  of  all  quadratic  forms  q{x,y)  =  ax  +  bxy  +  cy  . 

2.2.15.  Determine  which  of  the  following  conditions  describe  subspaces  of  the  vector  space  C1 
consisting  of  all  continuously  differentiable  scalar  functions  /(x). 

(a)  /( 2)  =  /( 3),  (b)  /'( 2)  =  /( 3),  (c)  /'( x)  +  /( x)  =  0,  (d)  /( 2  -  x)  =  /(x), 

(e)  f(pc  +  2)  =  /(x)  +  2,  (f)  /(  —  x)  =  ex  /(x).  (g)  /(x)  =  a  +  b  \  x  |  for  some  a,  b  E  R, 

2.2.16.  Let  V  =  C°[a,  6]  be  the  vector  space  consisting  of  all  functions  /(£)  that  are  defined 

and  continuous  on  the  interval  0  <  t  <  1.  Which  of  the  following  conditions  define 
subspaces  of  VI  Explain  your  answer,  (a)  /( 0)  =  0,  (b)  /(0)  =  2/(1),  (c)  /(0)/(l)  =  1, 
(d)  /(0)  =0  or  /(l)  =  0,  (e)  /(I  -  t)  =  (f)  f(l  -  t)  =  1  -  f(t), 

f  O  =  fQ  fX)dt,  ( h )  fQ  (t-l)f(t)dt  =  0,  (i)  /  /(s)  sin  s  ds  =  sint. 

2.2.17.  Prove  that  the  set  of  solutions  to  the  second  order  ordinary  differential  equation 
u  =  xu  is  a  vector  space. 

2.2.18.  Show  that  the  set  of  solutions  to  i/7  =  x  +  u  does  not  form  a  vector  space. 

2.2.19.  (a)  Prove  that  C1([a,  6],M2),  which  is  the  space  of  continuously  differentiable 
parameterized  plane  curves  f:  [a,  b]  — >  IR2,  is  a  vector  space. 

(b)  Is  the  subset  consisting  of  all  curves  that  go  through  the  origin  a  subspace? 


T 

2.2.20.  A  planar  vector  field,  v(x,  y)  =  ( u(x,  y),  v(x,  y) )  is  called  irrotational  if  it  has  zero 

du  dv 

divergence:  V  •  v  =  — - b  =  0-  Prove  that  the  set  of  all  irrotational  vector  fields  is  a 

ox  oy 

subspace  of  the  space  of  all  planar  vector  fields. 


2.2.21.  Let  C  C  M°°  denote  the  set  of  all  convergent  sequences  of  real  numbers,  where  IR00  was 
defined  in  Exercise  2.2.21.  Is  C  a  subspace? 


0  2.2.22.  Show  that  if  W  and  Z  are  subspaces  of  V,  then  (a)  their  intersection  W  D  Z  is  a 

subspace  of  V,  ( b )  their  sum  W  +  Z  =  {w  +  z|wGVI/,  z  £  Z  }  is  also  a  subspace,  but 

(c)  their  union  W  U  Z  is  not  a  subspace  of  V,  unless  W  C  Z  or  Z  C  W. 

0  2.2.23.  Let  V  be  a  vector  space.  Prove  that  the  intersection  H  Wi  of  any  collection  (finite  or 
infinite)  of  subspaces  W-  C  V  is  a  subspace. 


T  2.2.24.  Let  W  C  V  be  a  subspace.  A  subspace  Z  C  V  is  called  a  complementary  subspace  to  W 
if  (i)  WHZ  =  {0},  and  (  ii)  W  +  Z  =  V,  i.e.,  every  v  £  V  can  be  written  as  v  =  w  +  z  for 

r\ 

w  G  W  and  z  £  Z.  (a)  Show  that  the  x-  and  y-axes  are  complementary  subspaces  of  R  . 

r\ 

(b)  Show  that  the  lines  x  =  y  and  x  =  3y  are  complementary  subspaces  of  R  .  (c)  Show 

that  the  line  (a, 2a, 3a)  and  the  plane  x  +  2y  +  3 z  =  0  are  complementary  subspaces  of 

o 

R  .  (d)  Prove  that  if  v  =  w  +  z,  then  w  £  W  and  z  £  Z  are  uniquely  determined. 


2.2.25.  (a)  Show  that  V0  =  {  (v,  0)  |  v  £  V  }  and  =  {  (0,  w)  |  w  £  W  }  are  complementary 
subspaces,  as  in  Exercise  2.2.24,  of  the  Cartesian  product  space  V  x  W ,  as  defined  in 
Exercise  2.1.13.  (b)  Prove  that  the  diagonal  D  =  { (v,  v)  }  and  the  anti-diagonal 

A  =  { (v,  —  v) }  are  complementary  subspaces  of  V  x  V. 


2.2.26.  Show  that  the  set  of  skew-symmetric  n  x  n  matrices  forms  a  complementary  subspace 
to  the  set  of  symmetric  n  x  n  matrices.  Explain  why  this  implies  that  every  square  matrix 
can  be  uniquely  written  as  the  sum  of  a  symmetric  and  a  skew-symmetric  matrix. 


2.3  Span  and  Linear  Independence 
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2.2.27.  (a)  Show  that  the  set  of  even  functions,  /(—  x)  =  /(x),  is  a  subspace  of  the  vector  space 
of  all  functions  .F(R).  (b)  Show  that  the  set  of  odd  functions,  g(—  x)  =  —  g(x),  forms  a 

complementary  subspace,  as  defined  in  Exercise  2.2.24.  (c)  Explain  why  every  function  can 
be  uniquely  written  as  the  sum  of  an  even  function  and  an  odd  function. 

22  2.2.28.  Let  V  be  a  vector  space.  A  subset  of  the  form  A  =  {  w  +  a  |  w  £  IT  },  where  W  C  L  is 
a  subspace  and  a  £  V  is  a  fixed  vector,  is  known  as  an  affine  subspace  of  V.  (a)  Show  that 

an  affine  subspace  A  C  V  is  a  genuine  subspace  if  and  only  if  a  £  W.  (b)  Draw  the  affine 

2  rC]  q 

subspaces  A  C  R  when  (i)  W  is  the  x-axis  and  a  =  (2, 1 )  ,  (n)  W  is  the  line  y  =  ^  x 

and  a  =  (1,1  )T,  (in)  W  is  the  line  {  ( £,  —  t  )T  \  t  £  R  },  and  a  =  (2,-2  )T .  (c)  Prove  that 

2  2 

every  affine  subspace  A  C  R  is  either  a  point,  a  line,  or  all  of  R  .  (d)  Show  that  the  plane 

Q 

x  —  2y  3  z  =  1  is  an  affine  subspace  of  R  .  (e)  Show  that  the  set  of  all  polynomials  such 
that  p(0)  =  1  is  an  affine  subspace  of  'P^n\ 


22  2.2.29.  Quotient  spaces :  Let  V  be  a  vector  space  and  W  C  V  a  subspace.  We  say  that 
two  vectors  u,  v  £  V  are  equivalent  modulo  W  if  u  —  v  £  W.  (a)  Show  that  this 

defines  an  equivalence  relation ,  written  u  v  on  V,  i.e.,  (i)  v  v  for  every  v; 

(ii)  if  u  v,  then  v  u;  and  (Hi)  if,  in  addition,  v  z,  then  u  z.  (b)  The 
equivalence  class  of  a  vector  u  £  V  is  defined  as  the  set  of  all  equivalent  vectors, 

written  [u]w  =  {  v  £  V  |  v  u}.  Show  that  [0]-^  =  IE.  (c)  Let  V  =  R2  and 
W  =  {  ( x,  y )  |  x  =  2 y}.  Sketch  a  picture  of  several  equivalence  classes  as  subsets  of 

R2.  (d)  Show  that  each  equivalence  class  [u]w  for  u  £  V  is  an  affine  subspace  of  V,  as  in 
Exercise  2.2.28.  (e)  Prove  that  the  set  of  equivalence  classes,  called  the  quotient  space  and 
denoted  by  V )W  =  {[u]|  u£E},  forms  a  vector  space  under  the  operations  of  addition, 
[u]^  +  [v]^  =  [u  +  v]w,  and  scalar  multiplication,  c[u]^  =  [cu]^.  What  is  the  zero 
element?  Thus,  you  first  need  to  prove  that  these  operations  are  well  defined,  and  then 
demonstrate  the  vector  space  axioms. 


2)  2.2.30.  Define  f(x)  = 


—  l/x  .  n 

e  ’  ,  x  >  0, 

0,  x  <  0. 

(a)  Prove  that  all  derivatives  of  /  vanish  at  the  origin:  /^(0)  =  0  for  n  =  0, 1,  2, . . .  . 

(b)  Prove  that  f(x)  is  not  analytic  by  showing  that  its  Taylor  series  at  a  =  0  does  not 
converge  to  f(x)  when  x  >  0. 

1 


2.2.31.  Let  f(x)  = 


1  +  Xd 


.  (a)  Find  the  Taylor  series  of  /  at  a  =  0.  (b)  Prove  that  the  Taylor 


series  converges  for  |  x  |  <  1,  but  diverges  for  |  x  |  >  1.  (c)  Prove  that  f(x)  is  analytic  at  x  = 


2.3  Span  and  Linear  Independence 

The  definition  of  the  span  of  a  collection  of  elements  of  a  vector  space  generalizes,  in  a 
natural  fashion,  the  geometric  notion  of  two  vectors  spanning  a  plane  in  M3.  As  such,  it 
describes  the  first  of  two  universal  methods  for  constructing  subspaces  of  vector  spaces. 

Definition  2.13.  Let  v1? . . . ,  vfc  be  elements  of  a  vector  space  V.  A  sum  of  the  form 

k 

C1V1  +  C2V2  +  •••  +CfcVfc  =  (2'5) 

i  =  1 

where  the  coefficients  c1?  c2, . . . ,  ck  are  any  scalars,  is  known  as  a  linear  combination  of  the 
elements  v1? . . . ,  vfc.  Their  span  is  the  subset  W  =  span  {v1? . . . ,  vfc}  C  V  consisting  of  all 
possible  linear  combinations  with  scalars  c1: . . . ,  ck  £  M. 
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Figure  2.4.  Plane  and  Line  Spanned  by  Two  Vectors. 


For  instance,  +  v2  —  2v3,  8v:  —  |v3  =  8v:  +  0v2  —  |v3,  v2  =  Ov-l  +  lv2  + 
0v3,  and  0  =  Ov-l  +  0v2  +  0v3  are  four  different  linear  combinations  of  the  three  vector 
space  elements  v1?  v2,v3  G  V. 

The  key  observation  is  that  the  span  always  forms  a  subspace. 

Proposition  2.14.  The  span  W  =  span  {v1,...,vfc}  of  any  hnite  collection  of  vector 
space  elements  v1? . . . ,  vfc  G  V  is  a  subspace  of  the  underlying  vector  space  V. 

Proof :  We  need  to  show  that  if 

v  =  c1v1  +  •  •  •  +  ckwk  and  v  =  c1v1  +  •  •  •  +  ckwk 

are  any  two  linear  combinations,  then  their  sum  is  also  a  linear  combination,  since 

v  +  v  =  (c1+c1)v1  +  •••  +  (cfe  +  cfe)vfe  =  c1v1  +  ■■■  +Ckvk, 
where  ci  —  ci  +  ci.  Similarly,  for  any  scalar  multiple, 

av  =  (ac1)v1  +  •••  +  {ack)vk  =  c*v1  +  •••  +c*kvk, 
where  c*  =  aci ,  which  completes  the  proof.  Q.E.D. 


Example  2.15. 

(0 


€  R  } 


(ii) 


Examples  of  subspaces  spanned  by  vectors  in  R3: 

If  V:  7^  0  is  any  non-zero  vector  in  M3,  then  its  span  is  the  line  {cv1 

consisting  of  all  vectors  parallel  to  v:.  If  v:  =  0,  then  its  span  just  contains  the 
origin. 

If  v-l  and  v2  are  any  two  vectors  in  M3,  then  their  span  is  the  set  of  all  vectors  of  the 
form  c1v1  +  c2v2.  Typically,  such  a  span  prescribes  a  plane  passing  through  the 
origin.  However,  if  v:  and  v2  are  parallel,  then  their  span  is  just  a  line.  The  most 
degenerate  case  occurs  when  v:  =  v2  =  0,  where  the  span  is  just  a  point  —  the 
origin. 

(Hi)  If  we  are  given  three  non-coplanar  vectors  v1?  v2,  v3,  then  their  span  is  all  of  M3,  as 
we  shall  prove  below.  However,  if  they  all  he  in  a  plane,  then  their  span  is  the 
plane  —  unless  they  are  all  parallel,  in  which  case  their  span  is  a  line  —  or,  in  the 
completely  degenerate  situation  v:  =  v2  =  v3  =  0,  a  single  point. 

Thus,  every  subspace  of  M3  can  be  realized  as  the  span  of  some  set  of  vectors.  One  can 
consider  subspaces  spanned  by  four  or  more  vectors  in  M3,  but  these  continue  to  be  limited 
to  being  either  a  point  (the  origin),  a  line,  a  plane,  or  the  entire  three-dimensional  space. 
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A  crucial  question  is  to  determine  when  a  given  vector  belongs  to  the  span  of  a 
prescribed  collection. 

rj~i 

Example  2.16.  Let  W  C  R3  be  the  plane  spanned  by  the  vectors  v1  =  (1,  —2, 1 )  and 

v2  =  ( 2,  —3, 1  )T.  Question:  Is  the  vector  v  =  ( 0, 1,  —  1  )T  an  element  of  W7  To  answer, 
we  need  to  see  whether  we  can  find  scalars  c1?  c2  such  that 


v  =  c1v1+c2v2;  that  is, 


Thus,  c1 ,  c2  must  satisfy  the  linear  algebraic  system 

ci  T  2c2  =  0,  —2  c1  —  3c2  =  1, 


ci  +  c2 


(  C1  +  2c2  \ 

I  -2cx  -3 c2  I . 

\  C1  +  c2  / 


1. 


Applying  Gaussian  Elimination,  we  find  the  solution  c1  —  —2,  c2  =  1,  and  so  v  =  —  2  v1+v2 
does  belong  to  the  span.  On  the  other  hand,  v  =  ( 1, 0,  0  )  does  not  belong  to  W.  Indeed, 
there  are  no  scalars  c1?c2  such  that  v  =  <hvi  +  C2V2>  because  the  corresponding  linear 
system  is  incompatible. 


Warning.  It  is  entirely  possible  for  different  sets  of  vectors  to  span  the  same  subspace. 
For  instance,  e:  =  ( 1,  0,  0  )  and  e2  =  ( 0, 1,  0  )  span  the  xy- plane  in  M3,  as  do  the  three 

coplanar  vectors  vx  =  ( 1,  —  1,  0  )T  ,  v2  =  (  —  1,  2,  0  )T  ,  v3  =  ( 2, 1,  0  )T. 

Example  2.17.  Let  V  =  .F(R)  denote  the  space  of  all  scalar  functions  f(x). 

(a)  The  span  of  the  three  monomials  f±(x)  —  1,  f2(x)  =  ay  and  f3{x)  —  x2  is  the  set 
of  all  functions  of  the  form 

f(x)  =  Cx  fx(x)  +  c2  f2(x)  +  c3  f3(x)  =  c2  +  c2  X  +  c3  x2, 

where  c^c^Cg  are  arbitrary  scalars  (constants).  In  other  words,  span  {1  •j  tJj  tJC  2}  =  ■pi  2) 
is  the  subspace  of  all  quadratic  (degree  <  2)  polynomials.  In  a  similar  fashion,  the  space 
of  polynomials  of  degree  <  n  is  spanned  by  the  monomials  1  ,  t]0 ,  eT  ,  .  .  .  ,  t]0  . 

(b)  The  next  example  plays  a  key  role  in  many  applications.  Let  0  ^  w  E  M.  Consider 

the  two  basic  trigonometric  functions  f1(x)  =  cos  a;  ay  f2(x)  =  since  a?  of  frequency  ce,  and 
hence  period  2tt/uj.  Their  span  consists  of  all  functions  of  the  form 

f{x)  —  cx  fi(x)  +  c2  f2(x)  =  cx  cos cjx  +  c2  sinceo^.  (2.6) 

For  example,  the  function  cos(cea?  +  2)  lies  in  the  span  because,  by  the  addition  formula 
for  the  cosine, 

cos(ce  x  +  2)  =  (cos  2)  cos  ce  x  —  (sin  2)  sin  ce  x 

is  a  linear  combination  of  cos  ux  and  since  ay  with  respective  coefficients  cos  2,  sin  2.  Indeed, 
we  can  express  a  general  function  in  the  span  in  the  alternative  phase- amplitude  form 

f{x)  =  c1  coscux  A  c2  since  a;  =  r  cos(cea;  —  5),  (2-7) 

in  which  r  >  0  is  known  as  the  amplitude  and  0  <  5  <  2tt  the  phase  shift.  Indeed, 

expanding  the  right-hand  side,  we  obtain 

tcos{ujx  —  5)  =  (r  cos  5)  cos  ce  x  A  (r  sin  5)  since  ay  and  hence  cx  —  r  cos  5,  c2  —  r  sin  S. 
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Figure  2.5.  Graph  of  3cos(2x  —  1). 


Thus,  (r,  5)  are  the  polar  coordinates  of  the  point  c  =  (c1?  c2)  E  M2  prescribed  by  the  coef¬ 
ficients.  We  conclude  that  every  linear  combination  of  sincjx  and  cos  ux  can  be  rewritten 
as  a  single  cosine  containing  an  extra  phase  shift.  Figure  2.5  shows  the  particular  function 
3cos(2x  —  1),  which  has  amplitude  r  =  3,  frequency  uj  =  2,  and  phase  shift  5  =  1.  The 
first  peak  appears  at  x  =  5 /u  =  \. 

(c)  The  space  T ®  of  quadratic  trigonometric  polynomials  is  spanned  by  the  functions 

o  9 

1,  cosx,  sinx,  cos  x,  cosx  sinx,  sin  x. 

Its  general  element  is  a  linear  combination 

q(x)  =  c0  +  c1  cos  x  +  c2  sin  x  +  c3  cos2  x  +  c4  cos  x  sin  x  +  c5  sin2  x,  (2.8) 

where  c0, . . . ,  c5  are  arbitrary  constants.  A  more  useful  spanning  set  for  the  same  subspace 
consists  of  the  trigonometric  functions 

1,  cosx,  sinx,  cos2x,  sin2x.  (2.9) 

Indeed,  by  the  double-angle  formulas,  both 

9  9 

cos2x  =  cos  x  —  sin  x,  sin  2x  =  2  sinx  cosx, 

have  the  form  of  a  quadratic  trigonometric  polynomial  (2.8),  and  hence  both  belong  to 
T®.  On  the  other  hand,  we  can  write 

9  11  1  911 

cos  x  =  ^  cos2x  +  ^,  cosx  sinx  =  |  sin 2 x,  sin  x  =  —  |cos2x+|, 

in  terms  of  the  functions  (2.9).  Therefore,  the  original  linear  combination  (2.8)  can  be 
written  in  the  alternative  form 

q(x)  =  (c0  +  ^  c3  +  ^  c5)  +  c1  cosx  +  c2  sinx  +  (|  c3  —  \  cs)  cos2x  +  |  c4  sin2x 

=  c0-\-c1  cos  x  +  c2  sin  x  +  c3  cos  2 x  +  c4  sin  2 x,  (2.10) 

and  so  the  functions  (2.9)  do  indeed  span  T^.  It  is  worth  noting  that  we  first  character¬ 
ized  as  the  span  of  6  functions,  whereas  the  second  characterization  required  only  5 
functions.  It  turns  out  that  5  is  the  minimal  number  of  functions  needed  to  span  T^2\  but 
the  proof  of  this  fact  will  be  deferred  until  Chapter  4. 

(d)  The  homogeneous  linear  ordinary  differential  equation 

u"  +  2u'  -3u  =  0  (2.11) 

considered  in  part  (i)  of  Example  2.12  has  two  solutions:  f1(x)  =  ex  and  /2(x)  =  e~3x. 
(Now  may  be  a  good  time  for  you  to  review  the  basic  techniques  for  solving  linear,  constant 
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coefficient  ordinary  differential  equations,  cf.  [7,22];  see  also  Chapter  7.)  Its  general 
solution  is,  in  fact,  a  linear  combination 


u  =  cx  fx (x)  +  c2  f2 (x)  =c1ex  +c2e 


—  3x 


where  c1?  c2  are  arbitrary  scalars.  Thus,  the  vector  space  of  solutions  to  (2.11)  is  described 
as  the  span  of  these  two  basic  solutions.  The  fact  that  there  are  no  other  solutions  is 
not  obvious,  but  relies  on  the  basic  uniqueness  theorem  for  ordinary  differential  equations; 
further  details  can  be  found  in  Theorem  7.34. 


Remark.  One  can  also  define  the  span  of  an  infinite  collection  of  elements  of  a  vector  space. 
To  avoid  convergence  issues,  one  should  consider  only  finite  linear  combinations  (2.5).  For 
example,  the  span  of  the  monomials  1,  x,  x2,  x3, . . .  is  the  subspace  of  all  polynomials 
—  not  the  space  of  analytic  functions  or  convergent  Taylor  series.  Similarly,  the  span  of 
the  functions  1,  cosx,  sinx,  cos2x,  sin2x,  cos3x,  sin3x, ...  is  the  space  7^°°)  containing  all 
trigonometric  polynomials ,  of  fundamental  importance  in  the  theory  of  Fourier  series,  [61]. 


Exercises 


(3) 

belongs  to  the  subspace  of  IR3  spanned  by 

2\ 

-1 

/  5  \ 
,  -4 

3 ) 

V  2  J 

V  1/ 

2.3.1.  Show  that 

it  as  a  linear  combination  of  the  spanning  vectors. 


by  writing 


2.3.2.  Show  that 


/  — 3  \ 
7 
6 

V  1 


is  in  the  subspace  of  ]R4  spanned  by 


/  1\ 
-3 
-2 
V  0 / 


(  —2  \ 
6 
3 

V  4/ 


and 


/  —2  \ 
4 
6 

V-7  / 


i\ 

(i\ 

/0\ 

(  1\ 

2.3.3.  (a)  Determine  whether 

-2 

is  in  the  span  of 

i 

and 

1  .  (b)  Is 

-2 

G3  j 

V/ 

d/ 

in  the 


(1\ 


span  of 


/  1 
-2 


/0\ 


? 


\2j  V  0 J  \4j 


(c)  Is 


3  \ 
0 

-1 
—2  j 


in  the  span  of 


2.3.4.  Which  of  the  following  sets  of  vectors  span  all  of  IR2?  (a) 


/1\ 
2 
0 

Vi/ 

l 

-l 


/  o\ 
-1 
3 

V  0/ 
;  O) 


2\ 
0 
1 

V -1/ 


? 


2  )  ( 1  , 
-i  r  1 3  r 


(c) 


2 

1 


1 

2 


;  ( d ) 


6 


4 


;  (e) 


1 

1 


9  r  v  6/,v  y  v — 1 / ’  V — 1 /  \ — 1  / 

3  _ _i  !___  _ ^ _  /  o  M  \T 


3  C  (f)  (  0 

’  ^  >  1  o  /  ’  \  -1  / ’  \  -2  /  ’ 


2.3.5.  (a)  Graph  the  subspace  of  IKv  spanned  by  the  vector  v1  =  ( 3,  0, 1 )  . 

T  T 

(b)  Graph  the  subspace  spanned  by  the  vectors  v1  =  ( 3,  —2,  —  1 )  ,  v2  =  (  —2,  0,  —  1 )  . 

(c)  Graph  the  span  of  v1  =  ( 1,  0,  —1  )T,  v2  =  ( 0,  —1, 1  )T,  v3  =  ( 1,  — 1,  0  )T . 

2.3.6.  Let  U  be  the  subspace  of  M3  spanned  by  u1  =  ( 1,  2,  3  )T,  u2  =  (  2,  —1,  0  )T.  Let  V  be 

T  T 

the  subspace  spanned  by  =  (  5,  0,  3  )  ,  v2  =  ( 3, 1,  3  )  .  Is  V  a  subspace  of  U1  Are  U 
and  V  the  same? 


2.3.7.  (a)  Let  S  be  the  subspace  of  M2x2  consisting  of  all  symmetric  2x2  matrices.  Show  that 

1  0\  fO  0 

0  OJ  ’  ^0  1 

the  space  of  symmetric  3x3  matrices. 


S  is  spanned  by  the  matrices 


and  (  ^  q  ) .  (b)  Find  a  spanning  set  of 
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2.3.8.  (a)  Determine  whether  the  polynomials  x2  +  1,  x2  —  1,  x2  +  x  +  1  span  . 

(b)  Do  x3  —  1,  x2  +  1,  x  —  1, 1  span  7^3^?  (c)  What  about  x3,  x2  +  1,  x2  —  x,  x  +  1? 

2.3.9.  Determine  whether  any  of  the  following  functions  lies  in  the  subspace  spanned  by  l,x, 
sin x,  sin2  x:  (a)  3  —  5x,  (b)  x2  +  sin2  x,  (c)  sin x  —  2  cos  x,  (d)  cos2  x,  (e)  xsinx,  (f)  ex . 

2.3.10.  Write  the  following  trigonometric  functions  in  phase-amplitude  form: 

(a)  sin3x,  (b)  cos  x  —  sin x,  (c)  3  cos  2x  +  4  sin  2x,  (d)  cos  x  sin x. 

2.3.11.  (a)  Prove  that  the  set  of  solutions  to  the  homogeneous  ordinary  differential  equation 
uf  —  Au'  +  3a  =  0  is  a  vector  space,  (b)  Write  the  solution  space  as  the  span  of  a  finite 
number  of  functions,  (c)  What  is  the  minimal  number  of  functions  needed  to  span  the 
solution  space? 


2.3.12.  Explain  why  the  functions  l,cosx,sinx  span  the  solution  space  to  the  third  order 
ordinary  differential  equation  u"  +  u  =0. 

2.3.13.  Find  a  finite  set  of  real  functions  that  spans  the  solution  space  to  the  following 
homogeneous  ordinary  differential  equations:  (a)  u  —  2u  =  0,  (b)  u'  -f  4u  =  0, 

(c)  u"  —  3 u  =  0,  (d)  u"  +  u  +  u  =  0,  (e)  u"  —  5 u"  =  0,  (f)  u ^  +  u  =  0. 

2.3.14.  Consider  the  boundary  value  problem  u'  +  4a  =  0,  0  <  x  <  7r,  ^(0)  =  0,  u(tt)  =  0. 

(a)  Prove,  without  solving,  that  the  set  of  solutions  forms  a  vector  space. 

(b)  Write  this  space  as  the  span  of  one  or  more  functions.  Hint :  First  solve  the  differential 
equation;  then  find  out  which  solutions  satisfy  the  boundary  conditions. 


2.3.15.  Which  of  the  following  functions  lie  in  the  span  of  the  vector- valued  functions 

fi^  =  (*)’  f2(*)=(j)-  w  =  (2*)? 


(a) 


2 

1 


(b) 


1  -  2x 
1  —  x 


(c) 


1  -  2x 

—  1  —  X 


id) 


1  +  x‘ 


X 


(e) 


2  —  x 
0 


2.3.16.  True  or  false:  The  zero  vector  belongs  to  the  span  of  any  collection  of  vectors. 


2.3.17.  Prove  or  give  a  counter-example:  if  z  is  a  linear  combination  of  u,v,w,  then  w  is  a 
linear  combination  of  u,  v,  z. 

0  2.3.18.  Suppose  v1? . . . ,  vm  span  V.  Let  vm+1, . . . ,  vn  £  V  be  any  other  elements.  Prove  that 
the  combined  collection  v1? . . . ,  vn  also  spans  V. 

0  2.3.19.  (a)  Show  that  if  v  is  a  linear  combination  of  vl5 . . . ,  vm,  and  each  v  •  is  a  linear 
combination  of  w1? . . . ,  w  ,  then  v  is  a  linear  combination  of  wy, . . . ,  w  . 

(b)  Suppose  v1? . . . ,  vm  span  V.  Let  w1? . . . ,  wm  £  V  be  any  other  elements.  Suppose  that 
each  can  be  written  as  a  linear  combination  of  wy, . . . ,  wm.  Prove  that  w1? . . . ,  wm  also 
span  V. 

0  2.3.20.  The  span  of  an  infinite  collection  vl5  v2,  v3, ...  £  V  of  vector  space  elements  is  defined 

n 

as  the  set  of  all  finite  linear  combinations  E  where  n  <  oo  is  finite  but  arbitrary. 

i—  1 

(a)  Prove  that  the  span  defines  a  subspace  of  the  vector  space  V. 

O  Q 

(b)  What  is  the  span  of  the  monomials  1  tXy  j  j  ^  •  •  •  • 


Linear  Independence  and  Dependence 

Most  of  the  time,  all  of  the  vectors  used  to  form  a  span  are  essential.  For  example,  we 
cannot  use  fewer  than  two  vectors  to  span  a  plane  in  M3,  since  the  span  of  a  single  vector  is 
at  most  a  line.  However,  in  degenerate  situations,  some  of  the  spanning  elements  may  be 
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redundant.  For  instance,  if  the  two  vectors  are  parallel,  then  their  span  is  a  line,  but  only- 
one  of  the  vectors  is  really  needed  to  prescribe  the  line.  Similarly,  the  subspace  spanned  by 
the  polynomials  p1{x)  =  x  —  2,  p2(x)  =  3x-\-  4,  p3(x)  =  —x  +  1,  is  the  vector  space  ^ 
consisting  of  all  linear  polynomials.  But  only  two  of  the  polynomials  are  really  required  to 
span  V ^ .  (The  reason  will  become  clear  soon,  but  you  may  wish  to  see  whether  you  can 
demonstrate  this  on  your  own.)  The  elimination  of  such  superfluous  spanning  elements  is 
encapsulated  in  the  following  important  definition. 

Definition  2.18.  The  vector  space  elements  v1? . . . ,  vfc  £  V  are  called  linearly  dependent 
if  there  exist  scalars  c1? . . . ,  cfc,  not  all  zero ,  such  that 

CjVi  +  •••  +Ckvk  =  0.  (2.12) 

Elements  that  are  not  linearly  dependent  are  called  linearly  independent. 


The  restriction  that  not  all  the  q’s  are  zero  is  essential:  if  cx  —  •  •  •  =  ck  —  0,  then  the 
linear  combination  (2.12)  is  automatically  zero.  Thus,  to  check  linear  independence,  one 
needs  to  show  that  the  only  linear  combination  that  produces  the  zero  vector  (2.12)  is  this 
trivial  one.  In  other  words,  c1  —  •  •  •  =  ck  —  0  is  the  one  and  only  solution  to  the  vector 
equation  (2.12). 


Example  2.19.  Some  examples  of  linear  independence  and  dependence: 


(a)  The  vectors 


v 


l 


5 


are  linearly  dependent,  because 


vi  —  2  v2  +  v3  =  0. 


On  the  other  hand,  the  first  two  vectors  v1?v2  are  linearly  independent.  To  see  this, 
suppose  that 

C1V1  +  C2V2  = 

For  this  to  happen,  c1,c2  must  satisfy  the  homogeneous  linear  system 


2  c1  +  3  c2 
~  C1  +  c2 


cq  —  0,  2  c1  T  3  c2  —  0,  —  C|  1  c2  —  0, 


which,  as  you  can  check,  has  only  the  trivial  solution  c1  =  c2  =  0. 

(b)  In  general,  any  collection  v1,...,vfc  that  includes  the  zero  vector,  say  v1  =  0,  is 
automatically  linearly  dependent,  since  1  0  +  0  v2  +  •  •  •  +  0  vk  =  0  is  a  nontrivial  linear 
combination  that  adds  up  to  0. 

(c)  Two  vectors  v,w  E  V  are  linearly  dependent  if  and  only  if  they  are  parallel ,  meaning 
that  one  is  a  scalar  multiple  of  the  other.  Indeed,  if  v  =  aw,  then  v  —  aw  =  0  is  a 
nontrivial  linear  combination  summing  to  zero.  Conversely,  if  cv  +  dw  =  0  and  c  ^  0, 
then  v  =  —  (d/c) w,  while  if  c  =  0  but  d  ^  0,  then  w  =  0. 

(d)  The  polynomials 

Pi(x)  =  x  —  2,  p2(x)  =  x2  —  5x  +  4,  P3(x)  =  —  4x,  p4(x)  =  x2  —  1, 

are  linearly  dependent,  since 

Pi(x)  +P2(x)  -P3(x)  +  2 p4(x)  =  0 
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is  a  nontrivial  linear  combination  that  vanishes  identically.  On  the  other  hand,  the  first 
three  polynomials, 

Pi(x)  —  x  —  2,  p2(x)  ~  x2  ~  +  4,  Ps(x)  —  3x2  —  4x, 

are  linearly  independent.  Indeed,  if  the  linear  combination 

CiPi(x)  +  C2p2(x)  +  c3p3(x)  =  (c2  +  3c3)x2  +  (Cj  -  5c2  -  4c3)x  -  2cx +4c2  =0 

is  the  zero  polynomial,  then  its  coefficients  must  vanish,  and  hence  c1?c 2,c3  are  required 
to  solve  the  homogeneous  linear  system 

c<2  +  3  c3  =  0,  c y  5  c2  4  c3  =  0,  2  cq  +  4  C2  =  0. 

But  this  has  only  the  trivial  solution  c1  —  c2  =  c3  =  0,  and  so  linear  independence  follows. 


Remark.  In  the  last  example,  we  are  using  the  basic  fact  that  a  polynomial  is  identically 
zero, 

p{x)  —  a0  +  a1  x  +  a2  x2  +  •••  +  anxn  =  0  for  all  x, 

if  and  only  if  its  coefficients  all  vanish:  a0  =  a1  —  •  •  •  =  an  =  0.  This  is  equivalent  to  the 
“obvious”  fact  that  the  basic  monomial  functions  1  j  00  00  ^  ^  00  ^rc  linearly  independent. 

Exercise  2.3.36  asks  for  a  bona  fide  proof. 

Example  2.20.  The  trigonometric  functions 

o  o 

1,  cosx,  sinx,  cos  x,  cosx  sinx,  sin  x, 

which  were  used  to  define  the  vector  space  T ^  of  quadratic  trigonometric  polynomials, 
are,  in  fact,  linearly  dependent.  This  is  a  consequence  of  the  basic  trigonometric  identity 

o  9 

cos  x  +  sin  x  =  1, 

which  can  be  rewritten  as  a  nontrivial  linear  combination 

1  +  0  cos  x  +  0  sin  x  +  (—1)  cos2  x  +  0  cos  x  sin  x  +  (—1)  sin2  x  =  0 
that  equals  the  zero  function.  On  the  other  hand,  the  alternative  spanning  set 

1,  cosx,  sinx,  cos2x,  sin2x 


is  linearly  independent,  since  the  only  identically  zero  linear  combination, 


c0  +  cx  cos  x  +  c2  sin  x  +  c3  cos  2  x  +  c4  sin  2  x  =  0, 

turns  out  to  be  the  trivial  one  c0  =  •  •  •  =  c4  =  0.  However,  the  latter  fact  is  not  as  obvious, 
and  requires  a  bit  of  work  to  prove  directly;  see  Exercise  2.3.37.  An  easier  proof,  based  on 
orthogonality,  will  appear  in  Chapter  4. 


Let  us  now  focus  our  attention  on  the  linear  independence  or  dependence  of  a  set 
of  vectors  v1? . . . ,  vfc  E  Mn  in  Euclidean  space.  We  begin  by  forming  the  n  x  k  matrix 
A  =  (  v-l  . . .  vk )  whose  columns  are  the  given  vectors.  (The  fact  that  we  use  column 
vectors  is  essential  here.)  Our  analysis  is  based  on  the  very  useful  formula 


Ac  =  c1v1+  •••  +ckwk, 


where 


c  = 


1 


(2.13) 
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that  expresses  any  linear  combination  in  terms  of  matrix  multiplication.  For  example, 


/  ci  +  3c 
I  ~~  Ci  4-  2  c2 
y  4cx  —  c2  — 


Formula  (2.13)  follows  directly  from  the  rules  of  matrix  multiplication;  see  also  Exercise 
1.2.34(c).  It  enables  us  to  reformulate  the  notions  of  linear  independence  and  span  of 
vectors  in  Mn  in  terms  of  linear  algebraic  systems.  The  key  result  is  the  following: 


Theorem  2.21.  Let  v1? . . . ,  vfc  E  Mn  and  let  A  =  ( v1  . . .  vk  )  be  the  corresponding  n  x  k 
matrix  whose  columns  are  the  given  vectors. 

(a)  The  vectors  v1? . . . ,  vfc  E  Mn  are  linearly  dependent  if  and  only  if  there  is  a  non-zero 

solution  c  7^  0  to  the  homogeneous  linear  system  4c  =  0. 

(b)  The  vectors  are  linearly  independent  if  and  only  if  the  only  solution  to  the  homoge¬ 

neous  system  Ac  =  0  is  the  trivial  one,  c  =  0. 

(c)  A  vector  b  lies  in  the  span  of  v1? . . . ,  vfc  if  and  only  if  the  linear  system  Ac  =  b  is 

compatible,  i.e.,  has  at  least  one  solution. 


Proof :  We  prove  the  first  statement,  leaving  the  other  two  as  exercises  for  the  reader.  The 
condition  that  v1 , . . . ,  vfc  be  linearly  dependent  is  that  there  exists  a  nonzero  vector 

c  =  ( cx,  c2,  • . . ,  ck  )  such  that  4c  =  c1v1  +  •••  +cfcvfc  =  0. 

Therefore,  linear  dependence  requires  the  existence  of  a  nontrivial  solution  to  the  homoge¬ 
neous  linear  system  Ac  =  0.  Q.E.D. 


Example  2.22.  Let  us  determine  whether  the  vectors 


(2.14) 


are  linearly  independent  or  linearly  dependent.  We  combine  them  as  column  vectors  into 
a  single  matrix 

/  1  3  1  4\ 

A=  2  0  -4  2. 

\-l  4  6  3/ 

According  to  Theorem  2.21,  we  need  to  figure  out  whether  there  are  any  nontrivial  solutions 
to  the  homogeneous  equation  A  c  =  0;  this  can  be  done  by  reducing  A  to  row  echelon  form 


/ 1  3  1 

[7=  0  -6  -6 

\0  0  0 


(2.15) 


T 

The  general  solution  to  the  homogeneous  system  A  c  =  0  is  c  =  ( 2  c3  —  c4,  —  c3  —  c4,  c3,  c4  )  , 
where  c3,  c4  —  the  free  variables  —  are  arbitrary.  Any  nonzero  choice  of  c3,  c4  will  produce 
a  nontrivial  linear  combination 


(2c3  -  cJv-L  +  (-c3  -  c4)v2  +  C3v3  +  c4v4  =  0 
that  adds  up  to  the  zero  vector.  We  conclude  that  the  vectors  (2.14)  are  linearly  dependent. 
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In  fact,  in  this  particular  case,  we  didn’t  even  need  to  complete  the  row  reduction  if  we 
only  need  to  check  linear  (in) dependence.  According  to  Theorem  1.47,  any  coefficient  ma¬ 
trix  with  more  columns  than  rows  automatically  has  a  nontrivial  solution  to  the  associated 
homogeneous  system.  This  implies  the  following  result: 

Lemma  2.23.  Any  collection  of  k  >  n  vectors  in  Mn  is  linearly  dependent. 


T 

Warning.  The  converse  to  this  lemma  is  not  true.  For  example,  v:  =  (1,2,3)  and 

v2  =  (  —  2,  —  4,  —  6  )  are  two  linearly  dependent  vectors  in  M  ,  since  2v1  +  v2  =  0.  For  a 
collection  of  n  or  fewer  vectors  in  Mn,  one  needs  to  analyze  the  homogeneous  linear  system. 

Lemma  2.23  is  a  particular  case  of  the  following  general  characterization  of  linearly 
independent  vectors. 

Proposition  2.24.  A  set  of  k  vectors  in  Mn  is  linearly  independent  if  and  only  if  the 
corresponding  n  x  k  matrix  A  has  rank  k.  In  particular,  this  requires  k  <  n. 

Or,  to  state  the  result  another  way,  the  vectors  are  linearly  independent  if  and  only 
if  the  homogeneous  linear  system  Ac  =  0  has  no  free  variables.  Proposition  2.24  is  an 
immediate  corollary  of  Theorems  2.21  and  1.47. 


Example  2.22  (continued).  Let  us  now  see  which  vectors  b  G  I3  he  in  the  span  of 

the  vectors  (2.14).  According  to  Theorem  2.21,  this  will  be  the  case  if  and  only  if  the  linear 
system  Ac  =  b  has  a  solution.  Since  the  resulting  row  echelon  form  (2.15)  has  a  row  of 
all  zeros,  there  will  be  a  compatibility  condition  on  the  entries  of  b,  and  hence  not  every 
vector  lies  in  the  span.  To  find  the  precise  condition,  we  augment  the  coefficient  matrix, 
and  apply  the  same  row  operations,  leading  to  the  reduced  augmented  matrix 


1 

3 

1 

4 

h 

\ 

0 

—  6 

—  6 

—  6 

b2  —  2b1 

Vo 

0 

0 

0 

b3  +  lb2~ 

lbJ 

Therefore,  b  =  ( ,  62 , 63  )T  lies  in  the  span  if  and  only  if  —  |  bx  +  |  b2  +  b3  =  0.  Thus, 
these  four  vectors  span  only  a  plane  in  M3. 


The  same  method  demonstrates  that  a  collection  of  vectors  will  span  all  of  Mn  if  and  only 
if  the  row  echelon  form  of  the  associated  matrix  contains  no  all-zero  rows,  or,  equivalently, 
the  rank  is  equal  to  n,  the  number  of  rows  in  the  matrix. 

Proposition  2.25.  A  collection  of  k  vectors  spans  Mn  if  and  only  if  their  n  x  k  matrix 
has  rank  n.  In  particular,  this  requires  k  >  n. 


Warning.  Not  every  collection  of  n  or  more  vectors  in  Mn  will  span  all  of  Mn.  A  coun¬ 
terexample  was  already  provided  by  the  vectors  (2.14). 
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Exercises 


2.3.21.  Determine  whether  the  given  vectors  are  linearly  independent  or  linearly  dependent: 

1\  (  0\ 

(a)  >  (  X  )  >  0)  (  q  M  _«  )  >  (c)  (  X  ).  (  q  ).  (  o  ).  (d) 


(e) 


1\ 

-1 

OJ 


3 

•2  J 


(g) 


2 

V-l  / 


4\ 

/  — 6  \ 

( 

/-1\ 

5\ 

/1\ 

fl\ 

/  2  \ 

(1\ 

2 

-3 

.  (h) 

1 

3 

1 

0 

0 

2 

2 

0 

? 

0 

-1 

1 

1 

5 

2 

i 

5 

0 

1 

1 

5 

3 

V  — 6/ 

V  9 ) 

l  3 ) 

^  0 ) 

\-‘i) 

w 

\l) 

w 

U  / 

2.3.22.  (a)  Show  that  the  vectors 


/1\ 
0 
2 

VI/ 


/  —2  \ 
3 

-1 
V  1/ 


/  2  \ 
-2 
1 

V-l/ 


are  linearly  independent,  (b)  Which 


of  the  following  vectors  are  in  their  span?  (i) 


/ 1  \ 

( 1\ 

f0\ 

f0\ 

1 

2 

,  (ii) 

0 

0 

,  (in) 

1 

0 

,  (iv) 

0 

0 

V  i  / 

\o) 

\0j 

(c)  Suppose  b  =  (a,b,  c,  d)T  lies  in  their  span.  What  conditions  must  a,  b ,  c,  d  satisfy? 


2.3.23.  (a)  Show  that  the  vectors 


1\ 

/  1\ 

n 

n 

1 

1 

-1 

-i 

1 

? 

-1 

5 

0 

? 

0 

w 

^  0^ 

b 

W/ 

are  linearly  independent. 


rj~\ 

(b)  Show  that  they  also  span  IR4.  (c)  Write  ( 1,  0,  0, 1 )  as  a  linear  combination  of  them. 

2.3.24.  Determine  whether  the  given  row  vectors  are  linearly  independent  or  linearly  dependent: 

(a)  (2,1), (-1,3), (5, 2),  (b)  (1, 2,-1), (2, 4, -2),  (c)  ( 1,  2,  3 ) ,  ( 1, 4,  8 ) ,  ( 1,  5,  7 ), 

(d)  (1,1,0), (1,0, 3), (2, 2,1), (1,3, 4),  (e)  ( 1,  2,  0,  3 ) ,  ( -3, -1,  2, -2  ) ,  ( 3, -4, -4,  5  ) , 

(f)  (2, 1,-1, 3), (-1,3, 1,0), (5, 1,2, -3). 

2.3.25.  True  or  false:  The  six  3x3  permutation  matrices  (1.30)  are  linearly  independent. 

2.3.26.  True  or  false:  A  set  of  vectors  is  linearly  dependent  if  the  zero  vector  belongs  to  their  span. 

2.3.27.  Does  a  single  vector  ever  define  a  linearly  dependent  set? 

2.3.28.  Let  x  and  y  be  linearly  independent  elements  of  a  vector  space  V.  Show  that 

u  =  ax  +  by,  and  v  =  cx  +  d y  are  linearly  independent  if  and  only  if  ad  —  be  ^  0.  Is  the 
entire  collection  x,  y,  u,  v  linearly  independent? 


2.3.29.  Prove  or  give  a  counterexample  to  the  following  statement:  If  v1? . . . ,  vfc  are  elements  of 
a  vector  space  V  that  do  not  span  V,  then  vl5 . . . ,  vfc  are  linearly  independent. 

0  2.3.30.  Prove  parts  (b)  and  (c)  of  Theorem  2.21. 

0  2.3.31.  (a)  Prove  that  if  v1? . . . ,  vm  are  linearly  independent,  then  every  subset,  e.g.,  v1? . . . ,  vk 

with  k  <  m,  is  also  linearly  independent,  (b)  Does  the  same  hold  true  for  linearly 
dependent  vectors?  Prove  or  give  a  counterexample. 

2.3.32.  (a)  Determine  whether  the  polynomials  fi(x)  =  x2  —  3,  /2(x)  =  2  —  x,  /3(x)  =  {x  —  l)2, 
are  linearly  independent  or  linearly  dependent. 

(b)  Do  they  span  the  vector  space  of  all  quadratic  polynomials? 
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2.3.33.  Determine  whether  the  given  functions  are  linearly  independent  or  linearly  dependent: 
(a)  2  — x2,  3x,  x2  +  x  —  2,  (b)  3x  —  1,  x(2x  +  l),  x(x  —  1);  (c)  ex,  ex+1;  (d)  sinx, 
sin(x  +  1);  (e)  ex ,  ex+1,  ex+2;  (f)  sinx,  sin(x  +  1),  sin(x  +  2);  (g)  ex ,  xex,  x2ex; 

(h)  e®,  e2*,  e3a  (i)  x  +  y,  x  —  y  +  1,  x  +  3y  +  2  —  these  are  functions  of  two  variables. 

2.3.34.  Show  that  the  functions  /(x)  =  x  and  g(pc)  =  |  x  |  are  linearly  independent  when 
considered  as  functions  on  all  of  M,  but  are  linearly  dependent  when  considered  as  functions 
defined  only  on  IR+  =  {x  >  0}. 

n 

T  2.3.35.  (a)  Prove  that  the  polynomials  p^(x)  =  E  a  -  xJ  for  i  =  1 , . . .  ,  k  are  linearly 

j  =  0 

independent  if  and  only  if  the  k  x  (n  +  1)  matrix  A  whose  entries  are  their  coefficients 
a-,  1  <  i  <  fc,  0  <  j  <  n,  has  rank  k.  (b)  Formulate  a  similar  matrix  condition  for 
testing  whether  another  polynomial  q(x)  lies  in  their  span,  (c)  Use  (a)  to  determine 

whether  Pi(x)  =  x3  —  1,  p2(x)  =  —  2x  +  4,  p3(x)  =  x4  —  4x,  p4(x)  =  x2  +  1, 

4  3 

p5(x)  =  —  x  +4x  +  2x  +  1  are  linearly  independent  or  linearly  dependent,  (d)  Does  the 
polynomial  g(x)  =  x3  lie  in  their  span?  If  so  find  a  linear  combination  that  adds  up  to  q{pc). 

0  2.3.36.  The  Fundamental  Theorem  of  Algebra,  [26],  states  that  a  non-zero  polynomial  of 

degree  n  has  at  most  n  distinct  real  roots,  that  is,  real  numbers  x  such  that  p(pc)  =  0.  Use 

this  fact  to  prove  linear  independence  of  the  monomial  functions  1  ,  tJC  ,  tJy  ,  .  .  .  ,  tJC  . 

Remark.  An  elementary  proof  of  the  latter  fact  can  be  found  in  Exercise  5.5.38. 

T  2.3.37.  (a)  Let  x1,x2,  •  •  • ,  x  be  a  set  of  distinct  sample  points.  Prove  that  the  functions 

x)  are  linearly  independent  if  their  sample  vectors  f1? . . . ,  ffc  are  linearly 
independent  vectors  in  Mn.  (b)  Give  an  example  of  linearly  independent  functions  that  have 
linearly  dependent  sample  vectors,  (c)  Use  this  method  to  prove  that  the  functions  1,  cosx, 
sinx,  cos2x,  sin2x,  are  linearly  independent.  Hint :  You  need  at  least  5  sample  points. 


2.3.38.  Suppose  ^(t), . . . ,  f k(t)  are  vector-valued  functions  from  R.  to  IRn.  (a)  Prove  that  if 
f]_  (^Q ),---,  ffc(^o)  are  linearly  independent  vectors  in  IRn  at  one  point  t0,  then  f4  (t) , . . . ,  f k(t) 


are  linearly  independent  functions,  (b)  Show  that  ^(t) 


and  f2(t) 


2t  -  1 
2 12  -  t 


are 


linearly  independent  functions,  even  though  at  each  t0,  the  vectors  f1  (t0),  f2(t0)  are  linearly 
dependent.  Therefore,  the  converse  to  the  result  in  part  (a)  is  not  valid. 


U  2.3.39.  The  Wronskian  of  a  pair  of  differentiable  functions  /(x),g(x)  is  the  scalar  function 

(  /(x)  g(pc) 

W[f(x),g(x)}=  det  '  , 

V  /  (X)  g  (x) 

(a)  Prove  that  if  f,g  are  linearly  dependent,  then  W[f(x),g(x)]  =  0.  Hence,  if 

Q  Q 

W[f(x),g(x)]  ^  0,  then  f,g  are  linearly  independent,  (b)  Let  /(x)  =  x  ,  g{x)  =  |x  |  . 

Prove  that  /,  g  £  C2  are  twice  continuously  differentiable  and  linearly  independent,  but 
W[f(x),  g(pc)  ]  =  0.  Thus,  the  Wronskian  is  not  a  fool-proof  test  for  linear  independence. 

Remark.  It  can  be  proved,  m  that  if  /,  g  both  satisfy  a  second  order  linear  ordinary 
differential  equation,  then  f,g  are  linearly  dependent  if  and  only  if  W[f(x),g(x)]  =  0. 


j  =  f{x)g'(x)  ~  f{x)g{x).  (2.16) 


2.4  Basis  and  Dimension 

In  order  to  span  a  vector  space  or  subspace,  we  must  employ  a  sufficient  number  of  distinct 
elements.  On  the  other  hand,  including  too  many  elements  in  the  spanning  set  will  violate 
linear  independence,  and  cause  redundancies.  The  optimal  spanning  sets  are  those  that  are 
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also  linearly  independent.  By  combining  the  properties  of  span  and  linear  independence, 
we  arrive  at  the  all-important  concept  of  a  “basis” . 

Definition  2.26.  A  basis  of  a  vector  space  V  is  a  finite  collection  of  elements  v1? . . . ,  vn  E 
V  that  (a)  spans  V ,  and  (b)  is  linearly  independent. 


Bases  are  absolutely  fundamental  in  all  areas  of  linear  algebra  and  linear  analysis,  includ¬ 
ing  matrix  algebra,  Euclidean  geometry,  statistical  analysis,  solutions  to  linear  differential 
equations  —  both  ordinary  and  partial  —  linear  boundary  value  problems,  Fourier  analysis, 
signal  and  image  processing,  data  compression,  control  systems,  and  many  others. 

Example  2.27.  The  standard  basis  of  Mn  consists  of  the  n  vectors 


P\ 

0 

0 

0 

Vo/ 


/°\ 

1 

0 

0 

Vo/ 


(2.17) 


so  that  ei  is  the  vector  with  1  in  the  ith  slot  and  0’s  elsewhere.  We  already  encountered 
these  vectors  —  they  are  the  columns  of  the  n  x  n  identity  matrix.  They  clearly  span 
since  we  can  write  any  vector 


/  xi\ 


x  = 


x< 


\Xn  J 


=  x1e1-\-  x2  e2  + 


+  Xnen 


(2.18) 


as  a  linear  combination,  whose  coefficients  are  its  entries.  Moreover,  the  only  linear  combi¬ 
nation  that  yields  the  zero  vector  x  =  0  is  the  trivial  one  x1  —  •  •  •  =  xn  —  0,  which  shows 
that  e1? . . . ,  en  are  linearly  independent. 


In  the  three-dimensional  case  M3, 


a  common  physical  notation  for  the  standard  basis  is 


i 


(2.19) 


This  is  but  one  of  many  possible  bases  for  M3.  Indeed,  any  three  non-coplanar  vectors  can 
be  used  to  form  a  basis.  This  is  a  consequence  of  the  following  general  characterization  of 
bases  in  Euclidean  space  as  the  columns  of  a  nonsingular  matrix. 


Theorem  2.28.  Every  basis  of  Mn  consists  of  exactly  n  vectors.  Furthermore,  a  set  of 
n  vectors  v1? . . . ,  vn  E  Mn  is  a  basis  if  and  only  if  the  n  x  n  matrix  A  —  ( v:  ...  vn )  is 
nonsingular:  rank  A  =  n. 


Proof :  This  is  a  direct  consequence  of  Theorem  2.21.  Linear  independence  requires  that 
the  only  solution  to  the  homogeneous  system  Ac  =  0  be  the  trivial  one  c  =  0.  On  the 
other  hand,  a  vector  b  E  Mn  will  he  in  the  span  of  v1? . . . ,  vn  if  and  only  if  the  linear  system 
Ac  =  b  has  a  solution.  For  v1? . . . ,  vn  to  span  all  of  Mn,  this  must  hold  for  all  possible 
right-hand  sides  b.  Theorem  1.7  tells  us  that  both  results  require  that  A  be  nonsingular, 
i.e.,  have  maximal  rank  n.  Q.E.D. 
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Thus,  every  basis  of  n-dimensional  Euclidean  space  Mn  contains  the  same  number  of 
vectors,  namely  n.  This  is  a  general  fact,  that  motivates  a  linear  algebraic  characterization 
of  dimension. 

Theorem  2.29.  Suppose  the  vector  space  V  has  a  basis  v1? . . . ,  vn  for  some  n  £  N.  Then 
every  other  basis  of  V  has  the  same  number,  n,  of  elements  in  it.  This  number  is  called 
the  dimension  of  V ,  and  written  dim  V  =  n. 

The  proof  of  Theorem  2.29  rests  on  the  following  lemma. 

Lemma  2.30.  Suppose  v1? . . . ,  vn  span  a  vector  space  V.  Then  every  set  of  k  >  n  ele¬ 
ments  w1? . . . ,  E  V  is  linearly  dependent. 

Proof :  Let  us  write  each  element 

n 

wi  =  E%'vi’  J  = 

i  —  1 

as  a  linear  combination  of  the  spanning  set.  Then 

n  k 

C1W1  +  •  •  •  +  CkWk  =  Yi  E  CjVi' 

i=l  j= 1 

T 

This  linear  combination  will  be  zero  whenever  c  =  ( cx ,  c2, . . . ,  ck  )  solves  the  homogeneous 
linear  system  k 

^  ^  ®ij  Cj  0,  i  1,  .  .  .  ,  77-, 

3  =  1 

consisting  of  n  equations  in  k  >  n  unknowns.  Theorem  1.47  guarantees  that  every  ho¬ 
mogeneous  system  with  more  unknowns  than  equations  always  has  a  non-trivial  solution 
c^O,  and  this  immediately  implies  that  w1? . . . ,  wfc  are  linearly  dependent.  Q.E.D. 

Proof  of  Theorem  2.29 :  Suppose  we  have  two  bases  containing  a  different  number  of 
elements.  By  definition,  the  smaller  basis  spans  the  vector  space.  But  then  Lemma  2.30 
tell  us  that  the  elements  in  the  larger  purported  basis  must  be  linearly  dependent,  which 
contradicts  our  initial  assumption  that  the  latter  is  a  basis.  Q.E.D. 

As  a  direct  consequence,  we  can  now  give  a  precise  meaning  to  the  optimality  of  bases. 

Theorem  2.31.  Suppose  V  is  an  n-dimensional  vector  space.  Then 

(a)  Every  set  of  more  than  n  elements  of  V  is  linearly  dependent. 

(b)  No  set  of  fewer  than  n  elements  spans  V. 

(c)  A  set  of  n  elements  forms  a  basis  if  and  only  if  it  spans  V. 

(d)  A  set  of  n  elements  forms  a  basis  if  and  only  if  it  is  linearly  independent. 

In  other  words,  once  we  know  the  dimension  of  a  vector  space,  to  check  that  a  collection 
having  the  correct  number  of  elements  forms  a  basis,  we  only  need  establish  one  of  the 
two  defining  properties:  span  or  linear  independence.  Thus,  n  elements  that  span  an  n- 
dimensional  vector  space  are  automatically  linearly  independent  and  hence  form  a  basis; 
conversely,  n  linearly  independent  elements  of  an  n-dimensional  vector  space  automatically 
span  the  space  and  so  form  a  basis. 

Example  2.32.  The  standard  basis  of  the  space  V ^  of  polynomials  of  degree  <  n  is 
given  by  the  n  +  1  monomials  1  ^  j  ^  n.  We  conclude  that  the  vector  space  V ^ 
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has  dimension  n  +  1.  Any  other  basis  of  V ^  must  contain  precisely  n  +  1  polynomials. 
But,  not  every  collection  of  n  +  1  polynomials  in  V ^  is  a  basis  —  they  must  be  linearly 
independent.  We  conclude  that  no  set  of  n  or  fewer  polynomials  can  span  V^n\  while  any 
collection  of  n  +  2  or  more  polynomials  of  degree  <  n  is  automatically  linearly  dependent. 

By  definition,  every  vector  space  of  dimension  1  <  n  <  oc  has  a  basis.  If  a  vector  space 
E  has  no  basis,  it  is  either  the  trivial  vector  space  V  —  {0},  which  by  convention  has 
dimension  0,  or  its  dimension  is  infinite.  An  infinite-dimensional  vector  space  contains  an 
infinite  collection  of  linearly  independent  elements,  and  hence  no  (finite)  basis.  Examples 
of  infinite-dimensional  vector  spaces  include  most  spaces  of  functions,  such  as  the  spaces  of 
continuous,  differentiable,  or  mean  zero  functions,  as  well  as  the  space  of  all  polynomials, 
and  the  space  of  solutions  to  a  linear  homogeneous  partial  differential  equation.  (On  the 
other  hand,  the  solution  space  for  a  homogeneous  linear  ordinary  differential  equation 
turns  out  to  be  a  finite-dimensional  vector  space.)  There  is  a  well-developed  concept  of  a 
“complete  basis”  of  certain  infinite-dimensional  function  spaces,  [67,  68],  but  this  requires 
more  delicate  analytical  considerations  that  he  beyond  our  present  abilities.  Thus,  in  this 
book,  the  term  “basis”  always  means  a  finite  collection  of  vectors  in  a  finite-dimensional 
vector  space. 

Proposition  2.33.  If  v1? . . . ,  vm  span  the  vector  space  E,  then  dimE  <  m. 

Thus,  every  vector  space  spanned  by  a  finite  number  of  elements  is  necessarily  finite¬ 
dimensional,  and  so,  if  non-zero,  admits  a  basis.  Indeed,  one  can  find  the  basis  by  succes¬ 
sively  looking  at  the  members  of  a  collection  of  spanning  vectors,  and  retaining  those  that 
cannot  be  expressed  as  linear  combinations  of  their  predecessors  in  the  list.  Therefore, 
n  —  dimE  is  the  maximal  number  of  linearly  independent  vectors  in  the  set  v1? . . . ,  vm. 
The  details  of  the  proof  are  left  to  the  reader;  see  Exercise  2.4.22. 

Lemma  2.34.  The  elements  v1? . . . ,  vn  form  a  basis  of  E  if  and  only  if  every  x  G  E  can 
be  written  uniquely  as  a  linear  combination  of  the  basis  elements: 

n 

X  =  C1V1  +  •••  +c„v„  =  Yi  Civi-  (2-20) 

i=  1 

Proof :  The  fact  that  a  basis  spans  E  implies  that  every  x  G  E  can  be  written  as  some 
linear  combination  of  the  basis  elements.  Suppose  we  can  write  an  element 

X  =  C1V1+  •••  +CnVn  =  ClVl+  •••  +CnV„  (2.21) 

as  two  different  combinations.  Subtracting  one  from  the  other,  we  obtain 

(Ci-Ci)v1+  •••  +  (cn  -  cn)vn  =  0. 

The  left-hand  side  is  a  linear  combination  of  the  basis  elements,  and  hence  vanishes  if  and 
only  if  all  its  coefficients  ci  —  ci  —  0,  meaning  that  the  two  linear  combinations  (2.21)  are 
one  and  the  same.  Q.E.D. 

One  sometimes  refers  to  the  coefficients  (c1? . . . ,  cn)  in  (2.20)  as  the  coordinates  of  the 
vector  x  with  respect  to  the  given  basis.  For  the  standard  basis  (2.17)  of  Mn,  the  coordinates 

of  a  vector  x  =  ( x1,  x2,  ■  ■ . ,  xn  )  are  its  entries,  i.e. ,  its  usual  Cartesian  coordinates, 
cf.  (2.18). 
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Example  2.35. 

A  Wavelet  Basis. 

The  vectors 

V\ 

(  V 

(  V 

(  °\ 

1 

1 

-1 

0 

Vl  = 

1 

>  v2  = 

-1 

>  v3  = 

0 

>  v4  = 

i 

\1 ) 

\-l/ 

\  o  / 

\ — i  / 

(2.22) 


form  a  basis  of  M4.  This  is  verified  by  performing  Gaussian  Elimination  on  the  correspond¬ 
ing  4x4  matrix 

/I  i  i  0\ 

f  1  1-1  0 

“1-1  0  1 

Vl  -1  0  -1/ 

to  check  that  it  is  nonsingular.  This  is  a  very  simple  example  of  a  wavelet  basis.  Wavelets 
play  an  increasingly  central  role  in  modern  signal  and  digital  image  processing;  see  Sec¬ 
tion  9.7  and  [18,  88]. 

How  do  we  find  the  coordinates  of  a  vector,  say  x  =  (4, —2, 1,5)  ,  relative  to  the 
wavelet  basis?  We  need  to  find  the  coefficients  c1?  c2,  c3,  c4  such  that 


X  =  C1  V1  +  c2  V2  +  c3  v3  +  C4V4. 

T 

We  use  (2.13)  to  rewrite  this  equation  in  matrix  form  x  =  4c,  where  c  =  (  c1?  c2,  c3,  c4  )  . 
Solving  the  resulting  linear  system  by  Gaussian  Elimination  produces 

C]  =  2,  c2  =  —1,  c3  =  3,  c4  =  —2, 
which  are  the  coordinates  of 


4\ 

(X\ 

f  °\ 

-2 

1 

1 

-1 

0 

1 

=  2v:  -  v2  +  3v3  -  2v4  =  2 

1 

— 

-1 

+  3 

0 

-  2 

1 

V  5/ 

\  1  / 

\ — 1  / 

0/ 

\— 1/ 

in  the  wavelet  basis.  See  Section  9.7  for  the  general  theory  of  wavelet  bases. 

In  general,  to  find  the  coordinates  of  a  vector  x  with  respect  to  a  new  basis  of  Mn 
requires  the  solution  of  a  linear  system  of  equations,  namely 

4c  =  x  for  c  =  A_1x.  (2.23) 

The  columns  of  A  =  (v:  v2  ...  vn  )  are  the  basis  vectors,  x  =  ( x1:  x2,  •  •  • ,  xn  )  are 
the  Cartesian  coordinates  of  x,  with  respect  to  the  standard  basis  e1? . . .  ,en,  while  c  = 
( c4,  c2, . . . ,  cn  )  contains  its  coordinates  with  respect  to  the  new  basis  v1? . . . ,  vn.  In 
practice,  one  finds  the  coordinates  c  by  Gaussian  Elimination,  not  matrix  inversion. 

Why  would  one  want  to  change  bases?  The  answer  is  simplification  and  speed  —  many 
computations  and  formulas  become  much  easier,  and  hence  faster,  to  perform  in  a  basis 
that  is  adapted  to  the  problem  at  hand.  In  signal  processing,  wavelet  bases  are  particularly 
appropriate  for  denoising,  compression,  and  efficient  storage  of  signals,  including  audio, 
still  images,  videos,  medical  and  geophysical  images,  and  so  on.  These  processes  would  be 
quite  time-consuming  —  if  not  impossible  in  complicated  situations  like  video  and  three- 
dimensional  image  processing  —  to  accomplish  in  the  standard  basis.  Additional  examples 
will  appear  throughout  the  text. 


2.4  Basis  and  Dimension 


103 


Exercises 

o 

2.4.1.  Determine  which  of  the  following  sets  of  vectors  are  bases  of  R  :  (a) 


1 

3 


-2 

5 


( b ) 


1 

-1 


1 

1 


;  (c) 


1 

2 


2 

1 


;  (d) 


3 

5 


0 

0 


;  0) 


2 

0 


1 

2 


0 

-1 


Q 

2.4.2.  Determine  which  of  the  following  are  bases  of  R  .  (a) 


/ 1  \ 
3 

w 


2.4.3.  Let  v1  = 


;  (c) 


/ 1  \ 
o 

\2  J 


(2\ 

1 

W 
/  — 1  \ 


/1\ 

5 

\2  J 


;  O) 


o 


2\ 

o 

V-2 J  V -i/  o 
/  4 

r4  =  —  1  |.  (a)  Do  v1,v2,  v3,v4  span 

V  3 


R3?  Why  or  why  not?  (b)  Are  vl5  v2,v3,  v4  linearly  independent?  Why  or  why  not? 

Q 

(c)  Do  v1,v2,v3,v4  form  a  basis  for  R  ?  Why  or  why  not?  If  not,  is  it  possible  to  choose 
some  subset  that  is  a  basis?  (d)  What  is  the  dimension  of  the  span  of  v1?  v2,v3,v4? 
Justify  your  answer. 

2.4.4.  Answer  Exercise  2.4.3  when  v1  = 

V  2 )  \  5J  \  1 

Q 

2.4.5.  Find  a  basis  for  (a)  the  plane  given  by  the  equation  z  —  2y  =  0  in  R  ;  (b)  the  plane 

Q 

given  by  the  equation  4x  +  3y  —  z  =  0  in  R  ;  (c)  the  hyperplane  x  -\-  2y  -\-  z  —  w  =  0  in 


1\ 

(  2\ 

(  °^i 

-1 

>  v2  = 

-2 

>  v3  = 

-2 

>  v4  = 

3 

2  / 

V  5  J 

CD 

4  \  /  2\ 

2.4.6.  (a)  Show  that  |  0  ,  1 

i/  w 


,  and 


(  2^ 

-1 

V  1/ 


0\ 

2 

-d 


are  two  different  bases  for  the  plane 


x  —  2//  —  4z  =  0.  (b)  Show  how  to  write  both  elements  of  the  second  basis  as  linear 
combinations  of  the  first,  (c)  Can  you  find  a  third  basis? 

7.  A  basis  v4, . . . ,  v  of  Rn  is  called  right-handed  if  the  n  x  n  matrix  A  =  ( v2  . . .  v  ) 
whose  columns  are  the  basis  vectors  has  positive  determinant:  det  A  >  0.  If  det  A  <  0, 
the  basis  is  called  left-handed,  (a)  Which  of  the  following  form  right-handed  bases  of  R3? 

1 


CP 

/  — 1  \ 

/  2  \ 

/n 

/ — a.  \ 

^  1\ 

/ 

(<) 

0 

5 

i 

5 

i 

,  (m) 

1 

2  , 

i 

,  (m) 

2 

-2 

1 

V  oJ 

Id 

V  i  / 

V  3y 

C2j 

\ 

(iv) 

(  3\ 
2 

5 

d. 

/  2  \ 
1 

d/ 

d/ 

Q 

.  (b)  Show  that  if  vl5  v2,  v3  is  a  left-handed  basis  of  R  ,  then  v2, 

v4,  v3  and  —  v4,  v2,  v3  are  both  right-handed  bases,  (c)  What  sort  of  basis  has  det  A  =  0? 

2.4.8.  Find  a  basis  for  and  the  dimension  of  the  following  subspaces:  (a)  The  space  of  solutions 

/ 1  2  —1  1  \ 

to  the  linear  system  Ax  =  0,  where  A=(^  ^  ^  ^  J .  (b)  The  set  of  all  quadratic 

polynomials  p(x)  =  ax  -\-bx  +  c  that  satisfy  p(l)  =  0.  (c)  The  space  of  all  solutions  to  the 
homogeneous  ordinary  differential  equation  u"  —  u'  +  Au  —  4u  =  0. 

O  O  o 

2.4.9.  (a)  Prove  that  1  +  t  ,  t  +  t  ,  1  +  2t  +  t  is  a  basis  for  the  space  of  quadratic  polynomials 

.  (b)  Find  the  coordinates  of  p(t)  =  1  +  At  +  7t2  in  this  basis. 
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2.4.10.  Find  a  basis  for  and  the  dimension  of  the  span  of 
/  3  \  /  — 6\  /  2  \  /  0\  /  2 


(a) 


V 


(*>) 


V 


(c) 


V 


/ 


v 


i\ 
0 
1 
2  J 


(  0\ 
1 
1 

v  3  y 


/  2  \ 
-1 
-3 

v  1/ 


(  i\ 
-2 
1 

v  i  y 


2  /I  „  pasis  for 


2.4.11.  (a)  Show  that  1, 1  —  £,  (1  —  t)z ,  (1  —  £)"  is  a 

Q 

(b)  Write  p(t)  =  1  +  £  in  terms  of  the  basis  elements. 

2.4.12.  Let  denote  the  vector  space  consisting  of  all  polynomials  p(x)  of  degree  <  4. 


(a)  Are  x6  —  3x  +  1, 


x 


6x  +  3,  x4  —  2x3  +  1  linearly  independent  elements  of  7^4)? 


(b)  What  is  the  dimension  of  the  subspace  of  V ^  they  span? 

2.4.13.  Let  S  =  {  0,  |  }.  (a)  Show  that  the  sample  vectors  corresponding  to  the  functions 

1,  cos  7 rx,  cos  2 7 rx,  and  cos37tx  form  a  basis  for  the  vector  space  of  all  sample  functions  on 
S.  (b)  Write  the  sampled  version  of  the  function  /(x)  =  x  in  terms  of  this  basis. 

2.4.14.  (a)  Prove  that  the  vector  space  of  all  2  x  2  matrices  is  a  four-dimensional  vector  space 
by  exhibiting  a  basis,  (b)  Generalize  your  result  and  prove  that  the  vector  space  A4mXn 
consisting  of  all  m  x  n  matrices  has  dimension  mn. 


2.4.15.  Determine  all  values  of  the  scalar  k  for  which  the  following  four  matrices  form  a  basis 

1  -A  A  _  (k  -3\  A  _  (  1  0\  (  0  k 

0  0  j  ’  A2  -  l  1  0  ’  A3~  \-k  2  ’  A4~  [-1  -2 


for  M2x2:  ai  = 


2.4.16.  Prove  that  the  space  of  diagonal  n  x  n  matrices  is  an  n-dimensional  vector  space. 

2.4.17.  (a)  Find  a  basis  for  and  the  dimension  of  the  space  of  upper  triangular  2x2  matrices, 
(b)  Can  you  generalize  your  result  to  upper  triangular  n  x  n  matrices? 

2.4.18.  (a)  What  is  the  dimension  of  the  vector  space  of  2  x  2  symmetric  matrices?  Of  skew- 
symmetric  matrices?  (b)  Generalize  to  the  3x3  case,  (c)  What  about  n  x  n  matrices? 

O  2.4.19.  A  matrix  is  said  to  be  a  semi-magic  square  if  its  row  sums  and  column  sums  (i.e.,  the 

sum  of  entries  in  an  individual  row  or  column)  all  add  up  to  the  same  number.  An  example 

( 8  1  6  \ 


is 


,  whose  row  and  column  sums  are  all  equal  to  15.  (a)  Explain  why  the  set 


3  5  7 

\4  9  2y 

of  all  semi-magic  squares  is  a  subspace  of  the  vector  space  of  3  x  3  matrices,  (b)  Prove 
that  the  3x3  permutation  matrices  (1.30)  span  the  space  of  semi- magic  squares.  What  is 
its  dimension?  (c)  A  magic  square  also  has  the  diagonal  and  anti- diagonal  (running  from 
top  right  to  bottom  left)  add  up  to  the  common  row  and  column  sum;  the  preceding  3x3 
example  is  magic.  Does  the  set  of  3  x  3  magic  squares  form  a  vector  space?  If  so,  what  is 
its  dimension?  (d)  Write  down  a  formula  for  all  3  x  3  magic  squares. 


0  2.4.20.  (a)  Prove  that  if  vl5 . . . ,  vm  forms  a  basis  for  V  C  IRn,  then  m  <  n.  (b)  Under  the 

hypothesis  of  part  (a),  prove  that  there  exist  vectors  vm+1, . . . ,  vn  £  Mn  \  V  such  that  the 
complete  collection  vl5 . . . ,  vn  forms  a  basis  for  Mn.  (c)  Illustrate  by  constructing  bases  of 

rjn 

M3  that  include  ( i )  the  basis  (  1, 1,  \  )  of  the  line  x  =  y  =  2  z\  (ii)  the  basis  ( 1,  0,  —  1 )  , 
( 0, 1,  —2  )T  of  the  plane  x  +  2y  +  z  =  0. 

0  2.4.21.  Suppose  that  vl5 . . . ,  vn  form  a  basis  for  IRn.  Let  A  be  a  nonsingular  matrix.  Prove 
that  Av1? . . . ,  Awn  also  form  a  basis  for  Mn.  What  is  this  basis  if  you  start  with  the 


standard  basis:  v-  =  e-? 

L  L 


0  2.4.22.  Show  that  if  vl5 . . . ,  vn  span  V  ^  {0},  then  one  can  choose  a  subset  , . . . , 
forms  a  basis  of  V.  Thus,  dim  V  =  m  <  n.  Under  what  conditions  is  dimU  =  n? 


that 


m 


0  2.4.23.  Prove  that  if  v1? . . . ,  vn  are  a  basis  of  V,  then  every  subset  thereof,  e.g.,  , . . . ,  v 

linearly  independent. 


is 
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0  2.4.24.  Show,  by  example,  how  the  uniqueness  result  in  Lemma  2.34  fails  if  one  has  a  linearly 
dependent  set  of  vectors. 

0  2.4.25.  Let  W  C  V  be  a  subspace,  (a)  Prove  that  dim  W  <  dimP. 

(b)  Prove  that  if  dim  W  =  dimP  =  n  <  oo,  then  W  =  P.  Equivalently,  if  W  C  P  is  a 
proper  subspace  of  a  finite-dimensional  vector  space,  then  dim  W  <  dim  P. 

(c)  Give  an  example  in  which  the  result  is  false  if  dimP  =  oo. 

0  2.4.26.  Let  IP,  Z  C  P  be  complementary  subspaces  in  a  finite-dimensional  vector  space  P,  as  in 
Exercise  2.2.24.  (a)  Prove  that  if  w1? . . . ,  w  •  form  a  basis  for  W  and  z1? . . . ,  zk  a  basis  for 
Z,  then  wy , . . . ,  w  • ,  z-j_ , . . . ,  zk  form  a  basis  for  P.  (b)  Prove  that  dim  W  +  dim  Z  =  dim  P. 

0  2.4.27.  Let  P  be  a  finite-dimensional  vector  space  and  W  C  P  a  subspace.  Prove  that  the 

quotient  space,  as  defined  in  Exercise  2.2.29,  has  dimension  dim(P/W)  =  dimP  —  dim  IP. 

0  2.4.28.  Let  fi(x), . . . ,  fn{x)  be  scalar  functions.  Suppose  that  every  set  of  sample  points 
ay, . . . ,  xm  £  M,  for  all  finite  m  >  1,  leads  to  linearly  dependent  sample  vectors 
fl5 . . .  ,  fn  £  Mm.  Prove  that  fi(x), . . . ,  fn{x)  are  linearly  dependent  functions. 

Hint :  Given  sample  points  ay, . . . ,  xm,  let  V  Xm  C  Mn  be  the  subspace  consisting  of  all 
vectors  c  =  ( c1 ,  c2, . . . ,  cn  )  such  that  <y  iy  +  •  •  •  +  cn  fn  =  0.  First,  show  that  one  can 
select  sample  points  ^  its  2  }  ^  ^  •  such  that  IRn  D  2  VXl  x2  2  ' ' '  •  Then,  apply  Exercise 

2.4.25  to  conclude  that  P_  _  =  |0j. 

dj  1  , . . . ,  n  L  J 


2.5  The  Fundamental  Matrix  Subspaces 

Let  us  now  return  to  the  general  study  of  linear  systems  of  equations,  which  we  write  in 
our  usual  matrix  form 

4x  =  b.  (2.24) 

As  before,  A  is  an  m  x  n  matrix,  where  m  is  the  number  of  equations,  so  b  £  Mm,  and 
n  is  the  number  of  unknowns,  i.e.,  the  entries  of  x  £  Mn.  We  already  know  how  to  solve 
the  system,  at  least  when  the  coefficient  matrix  is  not  too  large:  just  apply  a  variant  of 
Gaussian  Elimination.  Our  goal  now  is  to  better  understand  the  solution(s)  and  thereby 
prepare  ourselves  for  more  sophisticated  problems  and  solution  techniques. 

Kernel  and  Image 

There  are  four  important  vector  subspaces  associated  with  any  matrix.  The  first  two  are 
defined  as  follows. 

Definition  2.36.  The  image  of  an  m  x  n  matrix  A  is  the  snbspace  img  A  C  spanned 
by  its  columns.  The  kernel  of  A  is  the  subspace  ker  A  C  Mn  consisting  of  all  vectors  that 
are  annihilated  by  A,  so 


ker  A  =  {  z  £  Mn 


A  z  =  0}  cRn. 


(2.25) 


The  image  is  also  known  as  the  column  space  or  the  ranged  of  the  matrix.  By  definition, 


^  The  latter  term  can  be  confusing,  since  some  authors  call  all  of  IRm  the  range  of  the  (function 
defined  by  the)  matrix,  hence  our  preference  to  use  image  here,  and,  later,  codomain  to  refer  to 
the  space  IRn.  On  the  other  hand,  the  space  IRm  will  be  called  the  domain  of  the  (function  defined 
by  the)  matrix. 
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a  vector  b  E  belongs  to  img  A  if  can  be  written  as  a  linear  combination, 

b  =  a:1v1+  •••  +xnv„, 

of  the  columns  of  A  =  ( v:  v2  . . .  vn  ).  By  our  basic  matrix  multiplication  formula  (2.13), 
the  right-hand  side  of  this  equation  equals  the  product  A  x  of  the  matrix  A  with  the  column 

T 

vector  x  —  (  ^  ^  tJC  2  5***5  ry\J  )  ,  and  hence  b  =  dx  for  some  x  E  Mn.  Thus, 

img  A  =  {  Ax  |  x  E  r  }  C  Mm,  (2.26) 

and  so  a  vector  b  lies  in  the  image  of  A  if  and  only  if  the  linear  system  A  x  =  b  has  a 
solution.  The  compatibility  conditions  for  linear  systems  can  thereby  be  re-interpreted  as 
the  requirements  for  a  vector  to  he  in  the  image  of  the  coefficient  matrix. 

A  common  alternative  name  for  the  kernel  is  the  null  space.  The  kernel  or  null  space  of 
A  is  the  set  of  solutions  z  to  the  homogeneous  system  dz  =  0.  The  proof  that  ker  A  is  a 
subspace  requires  us  to  verify  the  usual  closure  conditions:  suppose  that  z,  w  E  ker  A,  so 
that  Az  —  0  =  Aw.  Then,  by  the  compatibility  of  scalar  and  matrix  multiplication,  for 
any  scalars  c,  d, 

A(cz  +  <i  w)  =  cAz  +  dAw  =  0, 

which  implies  that  cz  -\-  dw  E  ker  A.  Closure  of  ker  A  can  be  re-expressed  as  the  fol¬ 
lowing  important  superposition  principle  for  solutions  to  a  homogeneous  system  of  linear 
equations. 

Theorem  2.37.  If  zl5...,zfc  are  individual  solutions  to  the  same  homogeneous  linear 
system  Az  =  0,  then  so  is  every  linear  combination  c1z1  +  •  •  •  +  ck  zk. 

Warning.  The  set  of  solutions  to  an  inhomogeneous  linear  system  A  x  =  b  with  b  ^  0  is 
not  a  subspace.  Linear  combinations  of  solutions  are  not,  in  general,  solutions  to  the  same 
inhomogeneous  system. 

Superposition  is  the  reason  why  linear  systems  are  so  much  easier  to  solve,  since  one 
needs  to  find  only  relatively  few  solutions  in  order  to  construct  the  general  solution  as  a 
linear  combination.  In  Chapter  7  we  shall  see  that  superposition  applies  to  completely 
general  linear  systems,  including  linear  differential  equations,  both  ordinary  and  partial; 
linear  boundary  value  problems;  linear  integral  equations;  linear  control  systems;  etc. 

Example  2.38.  Let  us  compute  the  kernel  of  the  matrix 

/ 1  -2  0  3\ 

A  =  2  -3  -1  -4  . 

\  3  -5  -1  -1/ 

Our  task  is  to  solve  the  homogeneous  system  Ax  =  0,  so  we  need  only  perform  the 
elementary  row  operations  on  A  itself.  The  resulting  row  echelon  form 

/ 1  -2  0  3\ 

U  =  0  1-1-10 

\0  0  0  0/ 

corresponds  to  the  equations  x  —  2  y  +  3tc  =  0,  y  —  z  —  10 re  =  0.  The  free  variables  are 
£,  w,  and  the  general  solution  is 


fx\ 

/2z  +  17  w\ 

(2\ 

(l7\ 

y 

z  +  lOw 

1 

1  n  1 1 

10 

z 

z 

—  /C 

1 

UJ 

0 

\w  / 

\  w  / 

\0/ 

1/ 

x  = 
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The  result  describes  the  most  general  vector  in  ker  A,  which  is  thus  the  two-dimensional 
subspace  of  M4  spanned  by  the  linearly  independent  vectors  (  2, 1, 1,  0  )T,  ( 17, 10,  0, 1  )T. 
This  example  is  indicative  of  a  general  method  for  finding  a  basis  for  ker  A ,  to  be  developed 
in  more  detail  below. 


Once  we  know  the  kernel  of  the  coefficient  matrix  A,  i.e.,  the  space  of  solutions  to  the 
homogeneous  system  A  z  =  0,  we  are  able  to  completely  characterize  the  solutions  to  the 
inhomogeneous  linear  system  (2.24). 

Theorem  2.39.  The  linear  system  Ax  =  b  has  a  solution  x*  if  and  only  if  b  lies  in  the 
image  of  A.  If  this  occurs,  then  x  is  a  solution  to  the  linear  system  if  and  only  if 

x  =  x*  +  z,  (2.27) 

where  z  E  ker  A  is  an  element  of  the  kernel  of  the  coefficient  matrix. 

Proof :  We  already  demonstrated  the  first  part  of  the  theorem.  If  Ax  —  b  =  Ax*  are  any 
two  solutions,  then  their  difference  z  =  x  —  x*  satisfies 

A  z  =  A(x  —  x*)  =  A  x  —  /lx*  =  b  —  b  =  0. 

and  hence  z  is  in  the  kernel  of  A.  Therefore,  x  and  x*  are  related  by  formula  (2.27),  which 
proves  the  second  part  of  the  theorem.  Q.E.D. 


Therefore,  to  construct  the  most  general  solution  to  an  inhomogeneous  system,  we  need 
only  know  one  particular  solution  x*,  along  with  the  general  solution  z  E  ker  A  to  the 
corresponding  homogeneous  system.  This  construction  should  remind  the  reader  of  the 
method  for  solving  inhomogeneous  linear  ordinary  differential  equations.  Indeed,  both 
linear  algebraic  systems  and  linear  ordinary  differential  equations  are  but  two  particular 
instances  in  the  general  theory  of  linear  systems,  to  be  developed  in  Chapter  7. 


Example  2.40.  Consider  the  system  4x  =  b,  where 


where  the  right-hand  side  of  the  system  will  remain  unspecified  for  the  moment.  Applying 
our  usual  Gaussian  Elimination  procedure  to  the  augmented  matrix 


0  -1 
1  -1 
-2  3 


K  \  /i  o  -l 

b2  leads  to  the  row  echelon  form  0  1  —2 

b3  J  \0  0  0 


bl  \ 
h 

^3  +  2  b2  +  bx  ) 


Therefore,  the  system  has  a  solution  if  and  only  if  the  compatibility  condition 


bx  +  262  +  63  =  0  (2.28) 

holds.  This  equation  serves  to  characterize  the  vectors  b  that  belong  to  the  image  of  the 
matrix  A ,  which  is  therefore  a  plane  in  M3. 

To  characterize  the  kernel  of  A ,  we  take  b  =  0,  and  solve  the  homogeneous  system 


108 


2  Vector  Spaces  and  Bases 


Az  —  0.  The  row  echelon  form  corresponds  to  the  reduced  system 


Zi 


Zo  =  o, 


Z<2  2  Zq  —  0 . 


The  free  variable  is  z3,  and  the  equations  are  solved  to  give 


Vl  =  c, 


z2  =  2c, 


=  c, 


where  c  is  an  arbitrary  scalar.  Thus,  the  general  solution  to  the  homogeneous  system  is 
z  =  (c,  2c,  c)  =  c  (1,2,1)  ,  and  so  the  kernel  is  the  line  in  the  direction  of  the 
vector  (1,2,1  )T. 

If  we  take  b  =  ( 3,  —2, 1  )T  —  which  satisfies  (2.28)  and  hence  lies  in  the  image  of  A  — 
then  the  general  solution  to  the  inhomogeneous  system  A  x  =  b  is 


Xi  =  3  +  c, 


Xo  —  1  2  c, 


£3  =  c, 


where  c  is  arbitrary.  We  can  write  the  solution  in  the  form  (2.27),  namely 


x  = 


+  c 


=  X*  +  z, 


(2.29) 


T 

where,  as  in  (2.27),  x*  =  (3,1,0)  plays  the  role  of  the  particular  solution,  while 
z  =  c  ( 1,  2, 1  )T  is  the  general  element  of  the  kernel. 

Finally,  we  remark  that  the  particular  solution  is  not  uniquely  defined  —  any  individual 
solution  to  the  system  will  serve  the  purpose.  Thus,  in  this  example,  we  could  choose,  for 
instance,  x**  =  ( —2,  —9,  —5  )T  instead,  corresponding  to  c  =  —5  in  the  preceding  formula 
(2.29).  The  general  solution  can  be  expressed  in  the  alternative  form 


X  =  X**  +  z  = 


where 


E  ker  A, 


which  agrees  with  (2.29)  when  we  identify  c  =  c  +  5. 


We  can  characterize  the  situations  in  which  the  linear  system 
any  of  the  following  equivalent  ways. 


has  a  unique  solution  in 


Proposition  2.41.  If  A  is  an  m  x  n  matrix,  then  the  following  conditions  are  equivalent: 
(i)  ker  A  =  {0},  i.e.,  the  homogeneous  system  Ax.  =  0  has  the  unique  solution  x  =  0. 
(ii)  rankA  =  n. 

(Hi)  The  linear  system  Ax  =  b  has  no  free  variables. 

(iv)  The  system  dx  =  b  has  a  unique  solution  for  each  b  E  img  A 


Thus,  while  existence  of  a  solution  may  depend  upon  the  particularities  of  the  right- 
hand  side  b,  uniqueness  is  universal:  if  for  any  one  b,  e.g.,  b  =  0,  the  system  admits  a 
unique  solution,  then  all  b  E  img  A  also  admit  unique  solutions.  Specializing  even  further 
to  square  matrices,  we  can  now  characterize  invertible  matrices  by  looking  at  either  their 
kernels  or  their  images. 

Proposition  2.42.  If  A  is  a  square  n  x  n  matrix,  then  the  following  four  conditions  are 
equivalent:  (i)  A  is  nonsingular;  (ii)  rank  A  =  n;  (Hi)  ker  A  —  {0};  (iv)  img  A  =  Mn. 
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Exercises 

2.5.1.  Characterize  the  image  and  kernel  of  the  following  matrices: 


(a) 


8 

-6 


4 

3 


(b) 


1 

-2 


1  2 
2  -4 


).  (c) 

1 

-2 

2 

4 

h-1  co 

,  (d) 

;  V 

4 

0 

5  J 

( 


\ 


1 

1 

1 

1 


-1 

0 

-2 

2 


0 

1 

1 

-3 


1\ 
1 
1 
1/ 


2.5.2.  For  the  following  matrices,  write  the  kernel  as  the  span  of  a  finite  number  of  vectors. 


Q 

Is  the  kernel  a  point,  line,  plane,  or  all  of  IR  ?  (a)  (2  —1  5),  (b) 


(c) 


2 

1 


6  -4 
3  2 


( d ) 


(  1 

2 

5  \ 

( 

2 

-1 

i\ 

0 

4 

8 

,  (e) 

-1 

1 

-2 

Vi 

-6 

-ll) 

K 

3 

-1 

L 

(0 


1  2-1 
3-2  0 

/  1  -2  3  \ 

-3  6  -9 

-2  4  -6 

V  3  0-1/ 


2.5.3.  (a)  Find  the  kernel  and  image  of  the  coefficient  matrix  for  the  system  x  —  3y-\-2z  =  a. 
2x  —  6y  +  2 w  =  6,  z  —  3re  =  c.  (b)  Write  down  compatibility  conditions  on  a,  6,  c  for  a 
solution  to  exist. 


/n 

( 

1 

-1 

0\ 

2 

is  a  particular  solution  to  the  equation 

-1 

0 

1 

\3> 

K 

0 

1 

-1/ 

2.5.4.  Suppose  x*  =  2  is  a  particular  solution  to  the  equation  —1  0  1  x  =  b. 

V  3  / 

(a)  What  is  b?  (b)  Find  the  general  solution. 

2.5.5.  Prove  that  the  average  of  all  the  entries  in  each  row  of  A  is  0  if  and  only  if 

(1,1,...,1)T  E  ker A. 

2.5.6.  True  or  false:  If  A  is  a  square  matrix,  then  ker  A  n  img  A  =  {0}. 

2.5.7.  Write  the  general  solution  to  the  following  linear  systems  in  the  form  (2.27).  Clearly 
identify  the  particular  solution  x*  and  the  element  z  of  the  kernel,  (a)  x  —  y  3z  =  1, 

( x\  /  o\ 

y  =  _i  ’  (c) 

W  v  ’ 


(b) 


1 

2 


-2 

3 


0 

1 


/I 

-1 

0\ 

(  x^ 

/ — i  \ 

2 

0 

-4 

y 

— 

-6 

c 

-1 

-2/ 

W 

^  — 4/ 

0 d ) 


(0 


( 


\ 


1 

2 

-3 

-1 

te) 


-2\ 
-4 
6 
2/ 
/  0 
1 

-2 

V  1 


u 

V 

-1 

-3 

5 

1 


/  — 1  \ 
-2 
3 

V  lj 


2 

0 

2 

-8 


1\ 
1 
3 
5/ 


/  x  \ 

y 

z 

\w  / 


(  —2  \ 
-3 
4 

V  5/ 


2.5.8.  Given  a,  r  /  0,  characterize  the  kernel  and  the  image  of  the  matrix 

/  a 


ar 


n 


ar 


ar 

n+1 


ar"'1  \ 

^  2  n  — 1 

ar 


Ur("_1)"  ar(n“1)n+1 


ar 


n2  — 1 


Hint :  See  Exercise  1.8.17. 


/ 


2 

0  2.5.9.  Let  the  square  matrix  P  be  idempotent,  meaning  that  P  =  P.  (a)  Prove  that 

w  E  imgP  if  and  only  if  Pw  =  w.  (b)  Show  that  img  P  and  kerP  are  complementary 

subspaces,  as  defined  in  Exercise  2.2.24,  so  every  v  E  IRn  can  be  uniquely  written  as 
v  =  w  +  z  where  w  E  img  P,  z  E  ker  P. 

I  a 

0  2.5.10.  Let  A  be  an  m  x  n  matrix.  Suppose  that  C  = 


B 


is  an  (m  +  k)  x  n  matrix  whose 


first  m  rows  are  the  same  as  those  of  A.  Prove  that  kerC  C  ker  A.  Thus,  appending  more 
rows  cannot  increase  the  size  of  a  matrix’s  kernel.  Give  an  example  in  which  ker  C  ^  ker  A. 
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0  2.5.11.  Let  A  be  an  m  x  n  matrix.  Suppose  that  C  =  ( A  B )  is  an  m  x  (n  +  k)  matrix  whose 
first  n  columns  are  the  same  as  those  of  A.  Prove  that  img  C  A  img  A.  Thus,  appending 
more  columns  cannot  decrease  the  size  of  a  matrix’s  image.  Give  an  example  in  which 
img  C  7^  img  A. 


The  Superposition  Principle 

The  principle  of  superposition  lies  at  the  heart  of  linearity.  For  homogeneous  systems, 
superposition  allows  one  to  generate  new  solutions  by  combining  known  solutions.  For 
inhomogeneous  systems,  superposition  combines  the  solutions  corresponding  to  different 
inhomogeneities. 

Suppose  we  know  particular  solutions  x\  and  x2  to  two  inhomogeneous  linear  systems 

Ax  =  b1?  Ax  =  b2, 

that  have  the  same  coefficient  matrix  A.  Consider  the  system 


4x  =  c1b1  +  c2  b2, 


whose  right-hand  side  is  a  linear  combination,  or  superposition ,  of  the  previous  two.  Then 
a  particular  solution  to  the  combined  system  is  given  by  the  same  superposition  of  the 
previous  solutions: 

X*  =  ClXt  +C2X2- 


The  proof  is  easy: 


Ax*  —  A^x^  +  c2x2)  =  qdxj  +  ^ix^  =  c1b1  +  c2b2. 


In  physical  applications,  the  inhomogeneities  b1?b2  typically  represent  external  forces, 
and  the  solutions  x^,x2  represent  the  respective  responses  of  the  physical  apparatus.  The 
linear  superposition  principle  says  that  if  we  know  how  the  system  responds  to  the  indi¬ 
vidual  forces,  we  immediately  know  its  response  to  any  combination  thereof.  The  precise 
details  of  the  system  are  irrelevant  —  all  that  is  required  is  its  linearity. 

Example  2.43.  For  example,  the  system 


models  the  mechanical  response  of  a  pair  of  masses  connected  by  springs,  subject  to  external 
forcing.  The  solution  x  =  (aq,  x2  )  represents  the  displacements  of  the  masses,  while  the 
entries  of  the  right-hand  side  f  =  (/i,/2)T  are  the  applied  forces.  (Details  can  be  found 
in  Chapter  6.)  We  can  directly  determine  the  response  of  the  system  xi  —  (  —  pp  )  1° 

a  unit  force  e:  =  ( 1,  0 )  on  the  first  mass,  and  the  response  x2  =  (  —  ^  )  to  a  unit 

force  e2  =  ( 0, 1 )  on  the  second  mass.  Superposition  gives  the  response  of  the  system  to 
a  general  force,  since  we  can  write 


fi  ei  +  f 2  e2 


and  hence 

x  =  Axi  +/2X2  =  fi 


a(S)+/.(!)’ 

Tbf  TEfi-TEf2) 

tJ  V-M  +  iH/ 
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The  preceding  construction  is  easily  extended  to  several  inhomogeneities,  and  the  result 
is  the  general  Superposition  Principle  for  inhomogeneous  linear  systems. 

Theorem  2.44.  Suppose  that  x|, . . . ,  x£  are  particular  solutions  to  each  of  the  inhomo¬ 


geneous  linear  systems 

Mx  =  b1,  dx  =  b2,  ...  ix  =  bfc,  (2.30) 

all  having  the  same  coefficient  matrix,  and  where  b1? . . . ,  hk  E  img  A  Then,  for  any  choice 
of  scalars  c1? . . . ,  cfc,  a  particular  solution  to  the  combined  system 

Ax  =  c1b1  +  •••  +  ckbk  (2.31) 

is  the  corresponding  superposition 

x*  =  Clx;+  •••  +  cfcx*  (2.32) 

of  individual  solutions.  The  general  solution  to  (2.31)  is 

x  =  x*  +  z  =  cxx*  +  •••  +  cfcx£+z,  (2.33) 

where  z  E  kerM  is  the  general  solution  to  the  homogeneous  system  Az  —  0. 

For  instance,  if  we  know  particular  solutions  x*, . . . ,  x^  to 

dx  =  for  each  i  =  l,...,m,  (2.34) 


where  e1? . . . ,  em  are  the  standard  basis  vectors  of  Mm,  then  we  can  reconstruct  a  particular 
solution  x*  to  the  general  linear  system  A  x  =  b  by  first  writing 

b  =  6iei  +  •  •  •  +  brnern 

as  a  linear  combination  of  the  basis  vectors,  and  then  using  superposition  to  form 

x*  =  x^  T  •••  +6mx^.  (2.35) 

However,  for  linear  algebraic  systems,  the  practical  value  of  this  insight  is  rather  limited. 
Indeed,  in  the  case  that  A  is  square  and  nonsingular,  the  superposition  formula  (2.35)  is 
merely  a  reformulation  of  the  method  of  computing  the  inverse  of  the  matrix.  Indeed,  the 
vectors  x£, . . . ,  that  satisfy  (2.34)  are  just  the  columns  of  M-1  (why?),  while  (2.35)  is 
precisely  the  solution  formula  x*  =  A~1  b  that  we  abandoned  in  practical  computations, 
in  favor  of  the  more  efficient  Gaussian  Elimination  process.  Nevertheless,  this  idea  turns 
out  to  have  important  implications  in  more  general  situations,  such  as  linear  differential 
equations  and  boundary  value  problems. 


Exercises 


2.5.12.  Find  the  solution  to  the  system 


1 

3 


2 

4 


1 

3 


2 

4 


x 

y 


o 

l 


Express  the  solution  to 


combination  of  and  x£. 


x 

V 


q  ^  ,  and  the  solution  x^  to 


1 

3 


2 

4 


x 

y 


l 

4 


as  a  linear 
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( 1  2  -i\ 

(  5i 

fi\ 

2.5.13.  Let  A  = 

2  5  —  1  .  Given  that  = 

-1 

solves  Ax  =  b1  = 

3 

0  3  2  j 

'V  2  / 

V6/ 

and 


/-n\ 

x2  — 

5 

solves  Ax 

=  b 

^  -E 

m 

2.5.14.  (a)  Show  that  x|  = 

1 

{oj 

and  x2 


,  find  a  solution  to  Ax  =  2^  +  b2 


/-3\ 

3 

v— 2 ; 


are  particular  solutions  to  the  system 


(2 

1 

-1 

-4 

-5\ 

-6 

X  = 

/  1\ 

—3  .  ( b )  Find  the  general  solution 

\3 

2 

~4j 

V  5  J 

2.5.15.  A  physical  apparatus  moves  2  meters  under  a  force  of  4  newtons.  Assuming  linearity, 
how  far  will  it  move  under  a  force  of  10  newtons? 


2.5.16.  Applying  a  unit  external  force  in  the  horizontal  direction  moves  a  mass  3  units  to  the 
right,  while  applying  a  unit  force  in  the  vertical  direction  moves  it  up  2  units.  Assuming 

linearity,  where  will  the  mass  move  under  the  applied  force  f  =  ( 2,  —  3  )T? 


2.5.17.  Suppose  x^  and  x2  are  both  solutions  to  Ax  =  b.  List  all  linear  combinations  of  x^ 
and  x2  that  solve  the  system. 


0  2.5.18.  Let  A  be  a  nonsingular  m  x  m  matrix,  (a)  Explain  in  detail  why  the  solutions 
x7, . . .  ,  x*  to  the  systems  (2.34)  are  the  columns  of  the  matrix  inverse  A-1. 


(  0  1  2\ 

(b)  Illustrate  your  argument  in  the  case  A  =  —  1  1  3  . 

V  1  0  lj 


2.5.19.  True  or  false:  If  x£  solves  Ax  =  c,  and  x2  solves  B  x  =  d,  then  x*  =  x^  +  x2  solves 
(A  +  B)  x  =  c  +  d. 

0  2.5.20.  Under  what  conditions  on  the  coefficient  matrix  A  will  the  systems  in  (2.34)  all  have  a 
solution? 


Adjoint  Systems,  Cokernel,  and  Coimage 

A  linear  system  of  m  equations  in  n  unknowns  is  based  on  an  m  x  n  coefficient  matrix  A. 
The  transposed  matrix  AT  will  be  of  size  n  x  m,  and  forms  the  coefficient  matrix  of  an 
associated  linear  system,  consisting  of  n  equations  in  m  unknowns. 

Definition  2.45.  The  adjoint t  to  a  linear  system  Ax  =  b  of  m  equations  in  n  unknowns 
is  the  linear  system 

ATy  =  f  (2.36) 

consisting  of  n  equations  in  m  unknowns  y  E  with  right-hand  side  f  E  Mn. 

Example  2.46.  Consider  the  linear  system 

x1  —  3  x2  —  7  x3  +  9  xA  =  bx , 

x2  +  5x3  —  3x4  =  b2,  (2.37) 

x1  —  2x2  —  2x3  +  6x4  =  63, 


^  Warning.  Some  texts  misuse  the  term  “adjoint”  to  describe  the  adjugate  or  cof actor  matrix , 
[80].  The  constructions  are  completely  unrelated,  and  the  adjugate  will  play  no  role  in  this  book. 
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of  three  equations  in  four  unknowns.  Its  coefficient  matrix 


A 


has  transpose 


Thus,  the  adjoint  system  to 
knowns: 


(2.37)  is  the  following  system  of  four  equations  in  three  un- 


Vi+  2/3  =  fv 

-3 y-i  +y2  ~  2 2/3  =  f2, 

-7y1  +  5y2-  2 y3  =  f3, 
92/i  -  3 2/2  +  62/3  =  h- 


(2.38) 


On  the  surface,  there  appears  to  be  no  direct  connection  between  the  solutions  to  a 
linear  system  and  its  adjoint.  Nevertheless,  as  we  shall  soon  see  (and  then  in  even  greater 
depth  in  Sections  4.4  and  8.7),  the  two  are  linked  in  a  number  of  remarkable,  but  subtle 
ways.  As  a  first  step  in  this  direction,  we  use  the  adjoint  system  to  define  the  remaining 
two  fundamental  subspaces  associated  with  a  coefficient  matrix  A. 


Definition  2.47.  The  coimage  of  an  m  x  n  matrix  A  is  the  image  of  its  transpose, 


coimg  A  =  img  AT  =  {  ATy  y  E  }  C  Mn. 


T 


The  cokernel  of  A  is  the  kernel  of  its  transpose, 


coker  A  =  ker  AT  =  {  w  E  ATw  =  0  }  C 


T. 


that  is,  the  set  of  solutions  to  the  homogeneous  adjoint  system. 


(2.39) 

(2.40) 


The  coimage  coincides  with  the  subspace  of  Mn  spanned  by  the  rows^  of  A,  and  is  thus 
often  referred  to  as  the  row  space.  As  a  direct  consequence  of  Theorem  2.39,  the  adjoint 
system  ATy  =  f  has  a  solution  if  and  only  if  f  E  imgAT  =  coimg  A.  The  cokernel  is  also 
sometimes  called  the  left  null  space  of  A,  since  it  can  be  identified  with  the  set  of  all  row 
vectors  r  satisfying  r  A  =  0T,  where  0T  is  the  row  vector  with  m  zero  entries.  Indeed, 
we  can  identify  r  =  wT  and  so,  taking  the  transpose  of  the  preceding  equation,  deduce 
ATw  =  (wTA)T  =  (rA)T  =  0,  and  so  w  =  rT  E  coker  A. 


Example  2.48. 

Elimination  on  the 


To  solve  the  linear  system  (2.37)  just  presented,  we  perform  Gaussian 

1  -3  -7 

augmented  matrix  I  0  1  5 


1  -2  -2 


9 

3 

6 


b2  ,  reducing  it  to  the  row 


/I 

-3 

-7 

9 

h  \ 

echelon  form  0 

1 

5 

-3 

b2  .  Thus,  the  system  has  a  solution  if  and 

\o 

0 

0 

0 

b3-b2~blJ 

only  if 


Or,  more  precisely,  the  column  vectors  obtained  by  transposing  the  rows. 
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which  is  required  in  order  that  b  E  img  A.  For  such  vectors,  the  general  solution  is 


x  = 


(h  +  362-8x3\ 
b2  —  5x3  +  3x4 

Xo 


x. 


/ 


A  +  3  b2  \ 

A 

0 

0 


+  x ■ 


/ 


/"8\ 

-5 

1 

0/ 


+  X, 


0 

VI/ 


In  the  second  expression,  the  hrst  vector  represents  a  particular  solution,  while  the  two 
remaining  terms  constitute  the  general  element  of  ker  A. 

The  solution  to  the  adjoint  system  (2.38)  is  also  obtained  by  Gaussian  Elimination, 


starting  with  its  augmented  matrix 


(  1 

0 

1 

A\ 

-3 

1 

-2 

A 

-7 

5 

-2 

f 3 

V  9 

-3 

6 

fj 

The  resulting  row  echelon 


form  is 


(l 

0 

1 

A 

\ 

0 

1 

1 

A  +  8  A 

0 

0 

0 

A  - 5  A 

T— 1 

00 

Vo 

0 

0 

A  +  8  A 

/ 

Thus,  there  are  two  consistency  constraints  re¬ 


quired  for  a  solution  to  the  adjoint  system: 

—  8  A  —  5  /2  +  /3  =  0, 


3/q  +  Ia  —  0- 


These  are  the  conditions  required  for  the  right-hand  side  to  belong  to  the  coimage: 
f  E  img  AT  =  coimgA  If  these  conditions  are  satisfied,  the  adjoint  system  has  the 
following  general  solution  depending  on  the  single  free  variable  y3 : 

y  =  Uh  +  h-vs)  =  Uh  +  /2j  +  v,  (  - 1 Y 

In  the  latter  formula,  the  first  term  represents  a  particular  solution,  while  the  second  is 
the  general  element  of  the  cokernel  ker  AT  —  coker  A. 


The  Fundamental  Theorem  of  Linear  Algebra 

The  four  fundamental  subspaces  associated  with  an  m  x  n  matrix  A ,  then,  are  its  image, 
coimage,  kernel,  and  cokernel.  The  image  and  cokernel  are  subspaces  of  Mm,  while  the 
kernel  and  coimage  are  subspaces  of  Mn.  The  Fundamental  Theorem  of  Linear  Algebra t 
states  that  their  dimensions  are  determined  by  the  rank  (and  size)  of  the  matrix. 

Theorem  2.49.  Let  A  be  an  m  x  n  matrix,  and  let  r  be  its  rank.  Then 

rri 

dim  coimg  A  =  dim  img  A  =  rank  A  =  rank  A  =  r, 
dim  ker  A  =  n  —  r,  dim  coker  A  =  rri  —  r. 

Thus,  the  rank  of  a  matrix,  i.e.,  the  number  of  pivots,  indicates  the  number  of  linearly 
independent  columns,  which,  remarkably,  is  always  the  same  as  the  number  of  linearly 
independent  rows.  A  matrix  and  its  transpose  are  guaranteed  to  have  the  same  rank,  i.e., 


(2.41) 


^  Not  to  be  confused  with  the  Fundamental  Theorem  of  Algebra,  which  states  that  every 
(nonconstant)  polynomial  has  a  complex  root;  see  [26]. 
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the  same  number  of  pivots,  despite  the  fact  that  their  row  echelon  forms  are  quite  different, 
and  are  almost  never  transposes  of  each  other.  Theorem  2.49  also  establishes  our  earlier 
contention  that  the  rank  of  a  matrix  is  an  intrinsic  quantity,  since  it  equals  the  common 
dimension  of  its  image  and  coimage,  and  so  does  not  depend  on  which  specific  elementary 
row  operations  are  employed  during  the  reduction  process,  nor  on  the  final  row  echelon 
form. 

Let  us  turn  to  the  proof  of  the  Fundamental  Theorem  2.49.  Since  the  dimension  of  a 
subspace  is  prescribed  by  the  number  of  vectors  in  any  basis,  we  need  to  relate  bases  of 
the  fundamental  subspaces  to  the  rank  of  the  matrix.  Before  trying  to  digest  the  general 
argument,  it  is  better  first  to  understand  how  to  construct  the  required  bases  in  a  particular 
example.  Consider  the  matrix 

/  2  -1  1  2\  /  2  —1  1  2  \ 

A  =  I  —8  4  —6  —4  | .  Its  row  echelon  form  U  —  |  0  0  —2  4  1  (2.42) 

\  4  -2  3  2/  \0  0  0  0/ 


is  obtained  in  the  usual  manner. 


There  are  two  pivots,  and  thus  the  rank  of  A  is  r  —  2. 


Kernel :  The  general  solution  to  the  homogeneous  system  Ax.  —  0  can  be  expressed  as 
a  linear  combination  of  n  —  r  linearly  independent  vectors,  whose  coefficients  are  the  free 
variables  for  the  system  corresponding  to  the  n  —  r  columns  without  pivots.  In  fact,  these 
vectors  form  a  basis  for  the  kernel,  which  thus  has  dimension  n  —  r. 

In  our  example,  the  pivots  are  in  columns  1  and  3,  and  so  the  free  variables  are  x2,x4. 
Applying  Back  Substitution  to  the  reduced  homogeneous  system  Ux  =  0,  we  obtain  the 
general  solution 


\x2  -  2x4\ 

(A 

(~2\ 

X0 

i 

0 

z 

=  ^2 

+  X 4 

2  x  4 

0 

2 

V  x4  /  \0/  V  1  / 


written  as  a  linear  combination  of  the  vectors 


(2.43) 


z1  =  (i,  1,  0,  0)T,  z2  —  ( —2,  o,  2,  1)T. 


We  claim  that  z1?  z2  form  a  basis  of  ker  A.  By  construction,  they  span  the  kernel,  and  linear 
independence  follows  easily,  since  the  only  way  in  which  the  linear  combination  (2.43)  could 
vanish  is  if  both  free  variables  vanish:  *Xj  2  iZ/  ^  0  • 

Coimage :  The  coimage  is  the  subspace  of  Mn  spanned  by  the  rows’*"  of  A.  As  we  prove 
below,  applying  an  elementary  row  operation  to  a  matrix  does  not  alter  its  coimage.  Since 
the  row  echelon  form  U  is  obtained  from  A  by  a  sequence  of  elementary  row  operations,  we 
conclude  that  coimg  A  =  coimg  U.  Moreover,  the  row  echelon  structure  implies  that  the 
r  nonzero  rows  of  U  are  necessarily  linearly  independent,  and  hence  form  a  basis  of  both 
coimg  U  and  coimg  A,  which  therefore  have  dimension  r  =  rank  A.  In  our  example,  then, 
a  basis  for  coimg  A  consists  of  the  vectors 


s 


l 


(2,  -1,  1,  2) 


Or,  more  correctly,  the  transposes  of  the  rows,  since  the  elements  of  IRn  are  supposed  to  be 
column  vectors. 
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coming  from  the  nonzero  rows  of  U .  The  reader  can  easily  check  their  linear  independence, 
as  well  as  the  fact  that  every  row  of  A  lies  in  their  span. 

Image'.  There  are  two  methods  for  computing  a  basis  of  the  image,  or  column  space. 
The  first  proves  that  it  has  dimension  equal  to  the  rank.  This  has  the  important,  and 
remarkable  consequence  that  the  space  spanned  by  the  rows  of  a  matrix  and  the  space 
spanned  by  its  columns  always  have  the  same  dimension,  even  though  they  are  usually 
different  subspaces  of  different  vector  spaces. 

Now,  the  row  echelon  structure  implies  that  the  columns  of  U  that  contain  the  pivots 
form  a  basis  for  its  image,  i.e.,  img  U .  In  our  example,  these  are  its  first  and  third  columns, 
and  you  can  check  that  they  are  linearly  independent  and  span  the  full  column  space.  But 
the  image  of  A  is  not  the  same  as  the  image  of  C7,  and  so,  unlike  the  coimage,  we  cannot 
directly  use  a  basis  for  img  U  as  a  basis  for  img  A  However,  the  linear  dependencies  among 
the  columns  of  A  and  U  are  the  same,  and  this  implies  that  the  r  columns  of  A  that  end 
up  containing  the  pivots  will  form  a  basis  for  img  A  In  our  example  (2.42),  the  pivots  he 
in  the  first  and  third  columns  of  [/,  and  hence  the  first  and  third  columns  of  A;  namely, 


form  a  basis  for  img  A  This  means  that  every  column  of  A  can  be  written  uniquely  as  a 
linear  combination  of  its  first  and  third  columns.  Again,  skeptics  may  wish  to  check  this. 

An  alternative  method  to  find  a  basis  for  the  image  is  to  recall  that  img  A  =  coimgAT, 
and  hence  we  can  employ  the  previous  algorithm  to  compute  coimgAT.  In  our  example, 
applying  Gaussian  Elimination  to 


AT  = 


/  2 

-1 

1 

\  2 


4 

6 

4 


4\ 

■2 

3 

2/ 


leads  to  the  row  echelon  form  V  = 


(2 

-8 

4\ 

0 

0 

-2 

0 

i 

0 

.  (2.44) 

Vo 

0 

0/ 

row 

echelon  form  of  A. 

However,  they  do  have  the  same  number  of  pivots,  since,  as  we  now  know,  both  A  and  AT 
have  the  same  rank,  namely  2.  The  two  nonzero  rows  of  V  (again  transposed  to  be  column 
vectors)  form  a  basis  for  coimgAT,  and  therefore 


forms  an  alternative  basis  for  img  A. 

Cokernel'.  Finally,  to  determine  a  basis  for  the  cokernel,  we  apply  the  algorithm  for 
finding  a  basis  for  kerAT  =  coker  A.  Since  the  ranks  of  A  and  AT  coincide,  there  are 
now  m  —  r  free  variables,  which  is  the  same  as  the  dimension  of  ker  AT .  In  our  particular 
example,  using  the  reduced  form  (2.44),  the  only  free  variable  is  y3:  and  the  general  solution 

to  the  homogeneous  adjoint  system  ATy  =  0  is 
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We  conclude  that  coker  A  is  one-dimensional,  with  basis  (0,  1 )  . 

Summarizing,  given  an  m  x  n  matrix  A  with  row  echelon  form  [/,  to  find  a  basis  for 

•  img  A:  choose  the  r  columns  of  A  in  which  the  pivots  appear  in  U ; 

•  ker  A:  write  the  general  solution  to  Ax.  —  0  as  a  linear  combination  of  the  n  —  r  basis 

vectors  whose  coefficients  are  the  free  variables; 

•  coimg  A:  choose  the  r  nonzero  rows  of  V ; 

•  coker  A:  write  the  general  solution  to  the  adjoint  system  ATy  =  0  as  a  linear 

combination  of  the  m  —  r  basis  vectors  whose  coefficients  are  the  free  vari¬ 
ables.  (An  alternative  method  —  one  that  does  not  require  solving  the  adjoint 
system  —  can  be  found  on  page  223.) 

Let  us  conclude  this  section  by  justifying  these  constructions  for  general  matrices,  and 
thereby  complete  the  proof  of  the  Fundamental  Theorem  2.49. 

Kernel :  If  A  has  rank  r,  then  the  general  element  of  the  kernel,  i.e.,  solution  to  the 
homogeneous  system  Ax  =  0,  can  be  written  as  a  linear  combination  of  n  —  r  vectors 
whose  coefficients  are  the  free  variables,  and  hence  these  vectors  span  ker  A.  Moreover, 
the  only  combination  that  yields  the  zero  solution  x  =  0  is  when  all  the  free  variables  are 
zero,  since  any  nonzero  value  for  a  free  variable,  say  xi  ^  0,  gives  a  solution  x^O  whose 
zth  entry  (at  least)  is  nonzero.  Thus,  the  only  linear  combination  of  the  n  —  r  kernel  basis 
vectors  that  sums  to  0  is  the  trivial  one,  which  implies  their  linear  independence. 

Coimage :  We  need  to  prove  that  elementary  row  operations  do  not  change  the  coimage. 
To  see  this  for  row  operations  of  the  first  type,  suppose,  for  instance,  that  A  is  obtained 
by  adding  b  times  the  first  row  of  A  to  the  second  row.  If  iq,  r2,  r3, . . . ,  rm  are  the  rows  of 
A,  then  the  rows  of  A  are  rl5  r2  =  r2  +  br1:  r3, . . . ,  rm.  If 

v  =  c1r1+c2r2+c3r3+  •••  +cmrm 
is  any  vector  belonging  to  coimg  A,  then 

v  =  c1r1+c2f2+c3r3  +  •••  +cmrm,  where  c1=c1-bc2, 

is  also  a  linear  combination  of  the  rows  of  the  new  matrix,  and  hence  lies  in  coimg  A. 
The  converse  is  also  valid  —  v  G  coimg  A  implies  v  G  coimg  A  —  and  we  conclude  that 
elementary  row  operations  of  type  #1  do  not  change  coimg  A.  The  proofs  for  the  other 
two  types  of  elementary  row  operations  are  even  easier,  and  are  left  to  the  reader. 

The  basis  for  coimg  A  will  be  the  first  r  nonzero  pivot  rows  s1? . . . ,  sr  of  U .  Since  the 
other  rows,  if  any,  are  all  0,  the  pivot  rows  clearly  span  coimg  U  —  coimg  A.  To  prove  their 
linear  independence,  suppose 

T  •  •  •  T  crsr  =  0.  (2.45) 

Let  ulk  7^  0  be  the  first  pivot.  Since  all  entries  of  U  lying  below  the  pivot  are  zero,  the 
kth  entry  of  (2.45)  is  cxulk  —  0,  which  implies  that  c1  =  0.  Next,  suppose  u2l  ^  0  is  the 
second  pivot.  Again,  using  the  row  echelon  structure  of  I/,  the  /th  entry  of  (2.45)  is  found 
to  be  c1ull  -\-  c2u2l  =  0,  and  so  c2  =  0,  since  we  already  know  c1  =  0.  Continuing  in  this 
manner,  we  deduce  that  only  the  trivial  linear  combination  cx  —  •  •  •  =  cr  —  0  will  satisfy 
(2.45),  proving  linear  independence.  Thus,  s1? . . . ,  sr  form  a  basis  for  coimg  U  —  coimg  A, 
which  therefore  has  dimension  r  =  rank  A. 

Image :  In  general,  a  vector  b  G  img  A  if  and  only  if  it  can  be  written  as  a  linear 
combination  of  the  columns:  b  =  Ax.  But,  as  we  know,  the  general  solution  to  the  linear 
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system  A  x  =  b  is  expressed  in  terms  of  the  free  and  basic  variables;  in  particular,  we  are 
allowed  to  set  all  the  free  variables  to  zero,  and  so  end  up  writing  b  in  terms  of  the  basic 
variables  alone.  This  effectively  expresses  b  as  a  linear  combination  of  the  pivot  columns 
of  A  only,  which  proves  that  they  span  img  A  To  prove  their  linear  independence,  suppose 
some  linear  combination  of  the  pivot  columns  adds  up  to  0.  Interpreting  the  coefficients 
as  basic  variables,  this  would  correspond  to  a  vector  x,  all  of  whose  free  variables  are 
zero,  satisfying  ix  =  0.  But  our  solution  to  this  homogeneous  system  expresses  the  basic 
variables  as  combinations  of  the  free  variables,  which,  if  the  latter  are  all  zero,  are  also  zero 
when  the  right-hand  sides  all  vanish.  This  shows  that,  under  these  assumptions,  x  =  0, 
and  hence  the  pivot  columns  are  linearly  independent. 

Cokernel:  By  the  preceding  arguments,  rank  A  =  rankAT  =  r,  and  hence  the  general 
element  of  coker  A  =  ker  AT  can  be  written  as  a  linear  combination  of  m  —  r  basis  vectors 
whose  coefficients  are  the  free  variables  in  the  homogeneous  adjoint  system  AT y  =  0. 
Linear  independence  of  the  basis  elements  follows  as  in  the  case  of  the  kernel. 


Exercises 

2.5.21.  For  each  of  the  following  matrices  find  bases  for  the  (i)  image,  (ii)  coimage, 


and 

(iv)  cokernel. 

(1 

-3 

2 

2 

1\ 

(0  0  —8 \ 

(l 

1 

2 

n 

0 

3 

-6 

0 

-2 

(b) 

12-1 

,  (c) 

1 

0 

-1 

3 

,  (d) 

2 

-3 

-2 

4 

0 

^2  4  6 ) 

^2 

3 

7 

3 

-3 

-6 

6 

3 

Vi 

0 

-4 

2 

3/ 

2.5.22.  Find  a  set  of  columns  of  the  matrix 


that  form  a  basis  for  its 


f-1  2  0  -3  5  \ 

2-4  1  1-4 

\  —3  6  2  0  8/ 

image.  Then  express  each  column  as  a  linear  combination  of  the  basis  columns. 


2.5.23.  For  each  of  the  following  matrices  A:  (a)  Determine  the  rank  and  the  dimensions  of  the 
four  fundamental  subspaces,  (b)  Find  bases  for  both  the  kernel  and  cokernel,  (c)  Find 
explicit  conditions  on  vectors  b  that  guarantee  that  the  system  Ax  =  b  has  a  solution. 

(d)  Write  down  a  specific  nonzero  vector  b  that  satisfies  your  conditions,  and  then  find  all 
possible  solutions  x. 


(0 


1 

2 


0) 


/  2 
6 
3 

VI 


\ 

( 

1 

5  \ 

,  (Hi) 

-2 

3 

,  ( ™ ) 

/ 

K 

2 

7) 

V 

(yii) 


(  2 
1 
3 

V-3 


4 

2 

6 

-6 


0 

3 

1 

2 


-5 

-6 

-4 

-6 

15 

15 

21 


2.5.24.  Find  the  dimension  of  and  a  basis  for  the  subspace  spanned  by  the  following  sets  of 
vectors.  Hint :  First  identify  the  subspace  with  the  image  of  a  certain  matrix. 

/1\  / 1 \  /  2 \  /  1\ 


(a) 


(  1\ 

2 

v-i / 


(b) 


(  l\ 

1 

V-1  / 


(  2  \ 
2 

V-2  / 


/— 3\ 
-3 

3  / 


V 


(d) 


1\ 

/  0\ 

/  — 3  \ 

0 

1 

-4 

-3 

? 

2 

? 

1 

2  / 

1-3  J 

V  6/ 

( 


V 


1\ 
3 
-8 
7/ 


( 


V 


2  \ 
1 
6 
9/ 


/ 


(e) 


V 


1\ 
1 

-1 
1 
1 / 


(c) 
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V 


o 

l 

VO  / 

2  \ 

1 
2 
2 

1/ 


0 
0 

Vi/ 

(  3\ 

0 
1 
3 

V  2  / 
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1 

VO/ 

0\ 

3 

4 
0 

1/ 
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3 

-3/ 

1\ 
3 

-1 
2 
1/ 


/1\ 
0 
3 
2 

VO/ 
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2.5.25.  Show  that  the  set  of  all  vectors  v  =  (a  —  36,  a  +  2c  +  4d,  6  +  3c  —  d,  c  —  d)  ,  where 
a,  6,  c,  d  are  real  numbers,  forms  a  subspace  of  IR4,  and  find  its  dimension. 

2.5.26.  Find  a  basis  of  the  solution  space  of  the  following  homogeneous  linear  systems. 


(a) 


x1  —  2  x?j  =  0, 

Xn  +  xA  =  0. 


(b) 


2x 1  +  x2  —  3x3  +  x4  =  0, 


x 


i~  x2  —  2x3  +  4x4  =  0, 


2x1  —  x2  —  x3  —  x4  =  0. 


(c) 


2.5.27.  Find  bases  for  the  image  and  coimage  of 


( 


2x1  +  x2  —  x4  =  0, 

—  2x1  +  2x3  —  2x4  =  0. 

1-3  0 

2  —6  4  I .  Make  sure  they  have  the 


3  9  1 

same  number  of  elements.  Then  write  each  row  and  column  as  a  linear  combination  of  the 
appropriate  basis  vectors. 


2.5.28.  Find  bases  for  the  image  of 


(1 

2 

-n 

0 

3 

-3 

2 

-4 

6 

u 

5 

—4/ 

using  both  of  the  indicated  methods. 


Demonstrate  that  they  are  indeed  both  bases  for  the  same  subspace  by  showing  how  to 
write  each  basis  in  terms  of  the  other. 


2.5.29.  Show  that  =  ( 1,  2,  0,  —  1  )T  ,  v2  =  (  —  3, 1, 1,  —  1  )1  ,  v3  =  (2,0,  —4,3  Y  and 


T 


wx  =  ( 3,  2,  —4,  2  )T  ,  w2  =  (2,3,  —7,  4  Y  ,  w3  =  ( 0,  3,  —3, 1  are  two  bases  for  the  same 
three-dimensional  subspace  V  C  IR4. 

2.5.30.  (a)  Prove  that  if  A  is  a  symmetric  matrix,  then  ker  A  =  coker  A  and  img  A  =  coimg  A. 
( b )  Use  this  observation  to  produce  bases  for  the  four  fundamental  subspaces  associated 
(1  2  0\ 


T 


with  A  = 


2 

Vo 


6 

2 


2 


(c)  Is  the  converse  to  part  (a)  true? 


2.5.31.  (a)  Write  down  a  matrix  of  rank  r  whose  first  r  rows  do  not  form  a  basis  for  its  row 
space,  (b)  Can  you  find  an  example  that  can  be  reduced  to  row  echelon  form  without  any 
row  interchanges? 

2.5.32.  Let  A  be  a  4  x  4  matrix  and  let  U  be  its  row  echelon  form,  (a)  Suppose  columns  1,  2, 

4  of  U  form  a  basis  for  its  image.  Do  columns  1,  2,  4  of  A  form  a  basis  for  its  image?  If  so, 
explain  why;  if  not,  construct  a  counterexample,  (b)  Suppose  rows  1,  2,  3  of  U  form  a  basis 
for  its  coimage.  Do  rows  1,  2,  3  of  A  form  a  basis  for  its  coimage?  If  so,  explain  why;  if  not, 
construct  a  counterexample,  (c)  Suppose  you  find  a  basis  for  ker  U .  Is  it  also  a  basis  for 
ker  A?  (d)  Suppose  you  find  a  basis  for  coker  U.  Is  it  also  a  basis  for  coker  A? 

2.5.33.  Can  you  devise  a  nonzero  matrix  whose  row  echelon  form  is  the  same  as  the  row 
echelon  form  of  its  transpose? 

0  2.5.34.  Explain  why  the  elementary  row  operations  of  types  #2  and  #3  do  not  change  the 
coimage  of  a  matrix. 


2.5.35.  Let  A  be  an  m  x  n  matrix.  Prove  that  img  A  =  IR  if  and  only  if  rank  A  =  m. 

2.5.36.  Prove  or  give  a  counterexample:  If  U  is  the  row  echelon  form  of  A,  then  imgU  =  img  A. 

0  2.5.37.  (a)  Devise  an  alternative  method  for  finding  a  basis  of  the  coimage  of  a  matrix. 

Hint :  Look  at  the  two  methods  for  finding  a  basis  for  the  image,  (b)  Use  your  method 


to  find  a  basis  for  the  coimage  of 
by  the  method  in  the  text? 


(1 

3 

-5 

2  \ 

2 

-1 

1 

-4 

V4 

5 

-9 

2  J 

Is  it  the  same  basis  as  found 


9 

0  2.5.38.  Prove  that  ker  A  C  ker  A  .  More  generally,  prove  ker  A  C  ker  B  A  for  every  compatible 
matrix  B. 
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0  2.5.39.  Prove  that  img  A  D  img  A  .  More  generally,  prove  img  A  D  img  (AB)  for  every 
compatible  matrix  B. 

2.5.40.  Suppose  A  is  an  m  x  n  matrix,  and  B  and  C  are  nonsingular  matrices  of  sizes  m  x  m 
and  n  x  n,  respectively.  Prove  that  rank  A  =  rank  BA  =  rank  AC  =  rank  B  AC. 


2.5.41.  True  or  false:  If  ker  A  =  kerB,  then  rank  A  =  rankB. 


0  2.5.42.  Let  A  and  B  be  matrices  of  respective  sizes  m  x  n  and  n  x  p. 

(a)  Prove  that  dimker(AB)  <  dim  ker  A  +  dimkerB. 

(b)  Prove  the  Sylvester  Inequalities  rank  A+ rankB  —  n  <  rank(AB)  <  minj  rank  A,  rankB  j . 


0  2.5.43.  Suppose  A  is  a  nonsingular  n  x  n  matrix,  (a)  Prove  that  every  n  x  (n  +  k)  matrix  of 
the  form  (A  B ),  where  B  has  size  n  x  k,  has  rank  n.  (b)  Prove  that  every  (n  +  /c)  x  n 


matrix  of  the  form  ^  ,  where  C  has  size  k  x  n,  has  rank  n. 


0  2.5.44.  Let  A  be  an  m  x  n  matrix  of  rank  r.  Suppose  vl5 . . . ,  vn  are  a  basis  for  Rn  such  that 
vr+1, . . . ,  vn  form  a  basis  for  ker  A.  Prove  that  w1  =  Av1? . . . ,  wr  =  Avr  form  a  basis 
for  img  A. 

0  2.5.45.  (a)  Suppose  A,  B  are  m  x  n  matrices  such  that  ker  A  =  ker  B.  Prove  that  there  is  a 

nonsingular  m  x  m  matrix  M  such  that  M  A  =  B.  Hint :  Use  Exercise  2.5.44.  (b)  Use  this 
to  conclude  that  if  A  x  =  b  and  B  x  =  c  have  the  same  solutions  then  they  are  equivalent 
linear  systems,  i.e.,  one  can  be  obtained  from  the  other  by  a  sequence  of  elementary  row 
operations. 

0  2.5.46.  (a)  Let  A  be  an  m  x  n  matrix  and  let  V  be  a  subspace  of  Rn.  Show  that  W  =  AV  = 

{  A  v  |  v  £  V  }  forms  a  subspace  of  img  A.  (b)  If  dim  V  =  k,  show  that  dim  W  <  min{  k,  r  }, 
where  r  =  rank  A.  Give  an  example  in  which  dim(AU)  <  dimU.  Hint :  Use  Exercise  2.4.25. 

0  2.5.47.  (a)  Show  that  an  m  x  n  matrix  has  a  left  inverse  if  and  only  if  it  has  rank  n. 

Hint:  Use  Exercise  2.5.46.  (b)  Show  that  it  has  a  right  inverse  if  and  only  if  it  has  rank  m. 

(c)  Conclude  that  only  nonsingular  square  matrices  have  both  left  and  right  inverses. 


2.6  Graphs  and  Digraphs 

We  now  present  an  intriguing  application  of  linear  algebra  to  graph  theory.  A  graph  consists 
of  a  finite  number  of  points,  called  vertices ,  and  finitely  many  lines  or  curves  connecting 
them,  called  edges.  Each  edge  connects  exactly  two  vertices,  which  are  its  endpoints.  To 
avoid  technicalities,  we  will  always  assume  that  the  graph  is  simple ,  which  means  that 
every  edge  connects  two  distinct  vertices,  so  no  edge  forms  a  loop  that  connects  a  vertex 
to  itself,  and,  moreover,  two  distinct  vertices  are  connected  by  at  most  one  edge.  Some 
examples  of  graphs  appear  in  Figure  2.6;  the  vertices  are  the  black  dots  and  the  edges  are 
the  lines  connecting  them. 

Graphs  arise  in  a  multitude  of  applications.  A  particular  case  that  will  be  considered  in 
depth  is  electrical  networks,  where  the  edges  represent  wires,  and  the  vertices  represent  the 
nodes  where  the  wires  are  connected.  Another  example  is  the  framework  for  a  building 
the  edges  represent  the  beams,  and  the  vertices  the  joints  where  the  beams  are  connected. 
In  each  case,  the  graph  encodes  the  topology  —  meaning  interconnectedness  —  of  the 
system,  but  not  its  geometry  —  lengths  of  edges,  angles,  etc. 

In  a  planar  representation  of  a  graph,  the  edges  are  allowed  to  cross  over  each  other 
at  non-nodal  points  without  meeting  —  think  of  a  network  where  the  (insulated)  wires  lie 
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Figure  2.6.  Three  Different  Graphs. 


Figure  2.7. 


Three  Versions  of  the  Same  Graph. 


on  top  of  each  other,  but  do  not  interconnect.  Thus,  the  first  graph  in  Figure  2.6  has  5 
vertices  and  8  edges;  the  second  has  4  vertices  and  6  edges  —  the  two  central  edges  do  not 
meet;  the  final  graph  has  5  vertices  and  10  edges. 

Two  graphs  are  considered  to  be  the  same  if  there  is  a  one-to-one  correspondence  be¬ 
tween  their  edges  and  their  vertices,  so  that  matched  edges  connect  matched  vertices.  In 
an  electrical  network,  moving  the  nodes  and  wires  around  without  cutting  or  rejoining  will 
have  no  effect  on  the  underlying  graph.  Consequently,  there  are  many  ways  to  draw  a  given 
graph;  three  representations  of  one  and  the  same  graph  appear  in  Figure  2.7. 

A  path  in  a  graph  is  an  ordered  list  of  distinct  edges  e1? . . . ,  ek  connecting  (not  necessarily 
distinct)  vertices  v1: . . . ,  vk+1  so  that  edge  ei  connects  vertex  vi  to  vi+1.  For  instance,  in 
the  graph  in  Figure  2.8,  one  path  starts  at  vertex  1,  then  goes  in  order  along  the  edges 
labeled  as  1,4,  3,  2,  successively  passing  through  the  vertices  1,  2,  4, 1,  3.  Observe  that  while 
an  edge  cannot  be  repeated  in  a  path,  a  vertex  may  be.  A  graph  is  connected  if  you  can 
get  from  any  vertex  to  any  other  vertex  by  a  path,  which  is  the  most  important  case  for 
applications.  We  note  that  every  graph  can  be  decomposed  into  a  disconnected  collection 
of  connected  subgraphs. 

A  circuit  is  a  path  that  ends  up  where  it  began,  i.e.,  vk+1  =  v1.  For  example,  the  circuit 
in  Figure  2.8  consisting  of  edges  1,  4,  5,  2  starts  at  vertex  1,  then  goes  to  vertices  2,  4,  3  in 
order,  and  finally  returns  to  vertex  1.  In  a  closed  circuit,  the  choice  of  starting  vertex  is 
not  important,  and  we  identify  circuits  that  go  around  the  edges  in  the  same  order.  Thus, 
for  example,  the  edges  4,  5,  2, 1  represent  the  same  circuit  as  above. 
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Figure  2.8.  A  Simple  Graph. 


In  electrical  circuits,  one  is  interested  in  measuring  currents  and  voltage  drops  along  the 
wires  in  the  network  represented  by  the  graph.  Both  of  these  quantities  have  a  direction, 
and  therefore  we  need  to  specify  an  orientation  on  each  edge  in  order  to  quantify  how  the 
current  moves  along  the  wire.  The  orientation  will  be  fixed  by  specifying  the  vertex  the 
edge  “starts”  at,  and  the  vertex  it  “ends”  at.  Once  we  assign  a  direction  to  an  edge,  a 
current  along  that  wire  will  be  positive  if  it  moves  in  the  same  direction,  i.e.,  goes  from 
the  starting  vertex  to  the  ending  one,  and  negative  if  it  moves  in  the  opposite  direction. 
The  direction  of  the  edge  does  not  dictate  the  direction  of  the  current  —  it  just  fixes  what 
directions  positive  and  negative  values  of  current  represent.  A  graph  with  directed  edges 
is  known  as  a  directed  graph ,  or  digraph  for  short.  The  edge  directions  are  represented  by 
arrows;  examples  of  digraphs  can  be  seen  in  Figure  2.9.  Again,  the  underlying  graph  is 
always  assumed  to  be  simple.  For  example,  at  any  instant  in  time,  the  internet  can  be 
viewed  as  a  gigantic  digraph,  in  which  each  vertex  represents  a  web  page,  and  each  edge 
represents  an  existing  link  from  one  page  to  another. 

Consider  a  digraph  D  consisting  of  n  vertices  connected  by  m  edges.  The  incidence 
matrix  associated  with  D  is  an  m  x  n  matrix  A  whose  rows  are  indexed  by  the  edges  and 
whose  columns  are  indexed  by  the  vertices.  If  edge  k  starts  at  vertex  i  and  ends  at  vertex 
j,  then  row  k  of  the  incidence  matrix  will  have  +1  in  its  (k,i)  entry  and  —  1  in  its  (k,j) 
entry;  all  other  entries  in  the  row  are  zero.  Thus,  our  convention  is  that  + 1  represents  the 
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Figure  2.10.  A  Simple  Digraph. 


outgoing  vertex  at  which  the  edge  starts  and  —  1  the  incoming  vertex  at  which  it  ends. 

A  simple  example  is  the  digraph  in  Figure  2.10,  which  consists  of  five  edges  joined  at 
four  different  vertices.  Its  5  x  4  incidence  matrix  is 


(2.46) 


Thus  the  first  row  of  A  tells  us  that  the  first  edge  starts  at  vertex  1  and  ends  at  vertex 
2.  Similarly,  row  2  says  that  the  second  edge  goes  from  vertex  1  to  vertex  3,  and  so  on. 
Clearly,  one  can  completely  reconstruct  any  digraph  from  its  incidence  matrix. 


Example  2.50.  The  matrix 


-1  0 

0  1 

-1  1 

1  0 

0  -1 
0  1 

0  0 


0 

0 

0 

-1 

1 

0 

1 


(2.47) 


qualifies  as  an  incidence  matrix  of  a  simple  graph  because  each  row  contains  a  single  +1, 
a  single  —1,  and  the  other  entries  are  0;  moreover,  to  ensure  simplicity,  no  two  rows  are 
identical  or  —1  times  each  other.  Let  us  construct  the  digraph  corresponding  to  A.  Since 
A  has  five  columns,  there  are  five  vertices  in  the  digraph,  which  we  label  by  the  numbers 
1,  2,  3,  4,  5.  Since  it  has  seven  rows,  there  are  7  edges.  The  first  row  has  its  + 1  in  column 
1  and  its  —  1  in  column  2,  and  so  the  first  edge  goes  from  vertex  1  to  vertex  2.  Similarly, 
the  second  edge  corresponds  to  the  second  row  of  A  and  so  goes  from  vertex  3  to  vertex  1. 
The  third  row  of  A  indicates  an  edge  from  vertex  3  to  vertex  2;  and  so  on.  In  this  manner, 
we  construct  the  digraph  drawn  in  Figure  2.11. 


The  incidence  matrix  serves  to  encode  important  geometric  information  about  the  di¬ 
graph  it  represents.  In  particular,  its  kernel  and  cokernel  have  topological  significance. 
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Figure  2.11.  Another  Digraph. 


For  example,  the  kernel  of  the  incidence  matrix  (2.47)  is  spanned  by  the  single  vector 
z  =  (1,1,1,1,1)T,  and  represents  the  fact  that  the  sum  of  the  entries  in  any  given  row  of 
A  is  zero.  This  observation  holds  in  general  for  connected  digraphs. 

Proposition  2.51.  If  A  is  the  incidence  matrix  for  a  connected  digraph,  then  ker  A  is 
one-dimensional,  with  basis  z  —  (1,1,...,1). 

Proof :  If  edge  k  connects  vertex  i  to  vertex  j,  then  the  kth  equation  in  A  z  =  0  is  zi  —  Zj  =  0, 
or,  equivalently,  zi  =  z-.  The  same  equality  holds,  by  a  simple  induction,  if  the  vertices  i 
and  j  are  connected  by  a  path.  Therefore,  if  D  is  connected,  then  all  the  entries  of  z  are 
equal,  and  the  result  follows.  Q.E.D. 

Remark.  In  general,  dim  ker  A  equals  the  number  of  connected  components  in  the  digraph 
D.  See  Exercise  2.6.12. 

Applying  the  Fundamental  Theorem  2.49,  we  immediately  deduce  the  following: 

Corollary  2.52.  If  A  is  the  incidence  matrix  for  a  connected  digraph  with  n  vertices,  then 
rank  A  —  n  —  1 . 

Next,  let  us  look  at  the  cokernel  of  an  incidence  matrix.  Consider  the  particular  example 
(2.46)  corresponding  to  the  digraph  in  Figure  2.10.  We  need  to  compute  the  kernel  of  the 
transposed  incidence  matrix 

/  1  1  1  0 

at=  - 1  0  0  1 

0-100 
Vo  0-1-1 

Solving  the  homogeneous  system  ATy  =  0  by  Gaussian  Elimination,  we  discover  that 
coker  A  =  ker  AT  is  spanned  by  the  two  vectors 

Yi  =  (i,  o,  -l,  i,  of,  y2  =  (o,  l,  -l,  o,  if. 

Each  of  these  vectors  represents  a  circuit  in  the  digraph.  Keep  in  mind  that  their  entries 
are  indexed  by  the  edges,  so  a  nonzero  entry  indicates  the  direction  to  traverse  the  corre¬ 
sponding  edge.  For  example,  y:  corresponds  to  the  circuit  that  starts  out  along  edge  1, 


°\ 
0 

1 

1/ 


(2.48) 
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then  goes  along  edge  4  and  finishes  by  going  along  edge  3  in  the  reverse  direction,  which  is 
indicated  by  the  minus  sign  in  its  third  entry.  Similarly,  y2  represents  the  circuit  consisting 
of  edge  2,  followed  by  edge  5,  and  then  edge  3,  backwards.  The  fact  that  y:  and  y2  are 
linearly  independent  vectors  says  that  the  two  circuits  are  “independent” . 

The  general  element  of  coker  A  is  a  linear  combination  c1y1  +  c2y2.  Certain  values  of 
the  constants  lead  to  other  types  of  circuits;  for  example,  —y1  represents  the  same  circuit 
as  y1?  but  traversed  in  the  opposite  direction.  Another  example  is 


Yi 


y2  =  (i,  -i,  o,  l,  -i)T, 


which  represents  the  square  circuit  going  around  the  outside  of  the  digraph  along  edges 
1,  4,  5,  2,  the  fifth  and  second  edges  taken  in  the  reverse  direction.  We  can  view  this  circuit 
as  a  combination  of  the  two  triangular  circuits;  when  we  add  them  together,  the  middle 
edge  3  is  traversed  once  in  each  direction,  which  effectively  “cancels”  its  contribution.  (A 
similar  cancellation  occurs  in  the  calculus  of  line  integrals,  [2,  78}.)  Other  combinations 
represent  “virtual”  circuits;  for  instance,  one  can  “interpret”  2y1  —  \y2  as  two  times  around 
the  first  triangular  circuit  plus  one-half  of  the  other  triangular  circuit,  taken  in  the  reverse 
direction  —  whatever  that  might  mean. 

Let  us  summarize  the  preceding  discussion. 


Theorem  2.53.  Each  circuit  in  a  digraph  D  is  represented  by  a  vector  in  the  cokernel  of 
its  incidence  matrix  A ,  whose  entries  are  + 1  if  the  edge  is  traversed  in  the  correct  direction, 
—  1  if  in  the  opposite  direction,  and  0  if  the  edge  is  not  in  the  circuit.  The  dimension  of 
the  cokernel  of  A  equals  the  number  of  independent  circuits  in  D. 


Remark.  A  full  proof  that  the  cokernel  of  the  incidence  matrix  of  a  general  digraph  has 
a  basis  consisting  entirely  of  independent  circuits  requires  a  more  in  depth  analysis  of  the 
properties  of  graphs  than  we  can  provide  in  this  abbreviated  treatment.  Full  details  can 
be  found  in  [6;  §11.3]. 

The  preceding  two  theorems  have  an  important  and  remarkable  consequence.  Suppose 
D  is  a  connected  digraph  with  mn  edges  and  n  vertices  and  A  its  m  x  n  incidence  matrix. 
Corollary  2.52  implies  that  A  has  rank  r  =  n  —  1  =  n  —  dimker  A.  On  the  other  hand, 
Theorem  2.53  tells  us  that  l  —  dim  coker  A  equals  the  number  of  independent  circuits  in 
D.  The  Fundamental  Theorem  2.49  says  that  r  —  mn  —  l.  Equating  these  two  formulas  for 
the  rank,  we  obtain  r  =  n  —  1  =  m  — l,  or  n  +  l  —  m  + 1.  This  celebrated  result  is  known  as 
Euler’s  formula  for  graphs,  first  discovered  by  the  extraordinarily  prolific  and  influential 
eighteenth-century  Swiss  mathematician  Leonhard  Euler  C 

Theorem  2.54.  If  G  is  a  connected  graph,  then 

ff  vertices  +  ff  independent  circuits  =  ff  edges  +  1.  (2.49) 


Remark.  If  the  graph  is  planar ,  meaning  that  it  can  be  graphed  in  the  plane  without 
any  edges  crossing  over  each  other,  then  the  number  of  independent  circuits  is  equal  to  the 
number  of  “holes”,  i.e.,  the  number  of  distinct  polygonal  regions  bounded  by  the  edges  of 
the  graph.  For  example,  the  pentagonal  digraph  in  Figure  2.11  bounds  three  triangles,  and 
so  has  three  independent  circuits. 


Pronounced  “Oiler” .  Euler  spent  most  of  his  career  in  Russia  and  Germany. 
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Figure  2.12.  A  Cubical  Digraph. 


Example  2.55.  Consider  the  graph  corresponding  to  the  edges  of  a  cube,  as  illustrated 

in  Figure  2.12,  where  the  second  figure  represents  the  same  graph  squashed  down  onto  a 
plane.  The  graph  has  8  vertices  and  12  edges.  Euler’s  formula  (3.92)  tells  us  that  there 
are  5  independent  circuits.  These  correspond  to  the  interior  square  and  four  trapezoids  in 
the  planar  version  of  the  digraph,  and  hence  to  circuits  around  5  of  the  6  faces  of  the  cube. 
The  “missing”  face  does  indeed  define  a  circuit,  but  it  can  be  represented  as  the  sum  of 
the  other  five  circuits,  and  so  is  not  independent.  In  Exercise  2.6.6,  the  reader  is  asked  to 
write  out  the  incidence  matrix  for  the  cubical  digraph  and  explicitly  identify  the  basis  of 
its  kernel  with  the  circuits. 

Further  development  of  the  many  remarkable  connections  between  graph  theory  and 
linear  algebra  will  be  developed  in  the  later  chapters.  The  applications  to  very  large 
graphs,  e.g.,  with  millions  or  billions  of  vertices,  is  playing  an  increasingly  important  role 
in  modern  computer  science  and  data  analysis.  One  example  is  the  dominant  internet 
search  engine  run  by  Google,  which  is  based  on  viewing  the  entire  internet  as  a  gigantic 
(time-dependent)  digraph.  The  vertices  are  the  web  pages,  and  a  directed  edge  represents 
a  link  from  one  web  page  to  another.  (The  resulting  digraph  is  not  simple  according  to  our 
definition,  since  web  pages  can  link  in  both  directions.)  Ranking  web  pages  by  importance 
during  a  search  relies  on  analyzing  the  internet  digraph;  see  Section  9.3  for  further  details. 


Exercises 


2.6.1.  (a)  Draw  the  graph  corresponding  to  the  6x7  incidence  matrix  whose  nonzero  (i,j)  entries 
equal  1  if  j  =  i  and  —  1  if  j  =  i  +  1,  for  i  =  1  to  6.  (b)  Find  a  basis  for  its 
kernel  and  cokernel,  (c)  How  many  circuits  are  in  the  digraph? 


2.6.2.  Draw  the  digraph  represented  by  the  following  incidence  matrices: 
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2.6.3.  Write  out  the  incidence  matrix  of  the  following  digraphs. 


2.6.4.  For  each  of  the  digraphs  in  Exercise  2.6.3,  see  whether  you  can  predict  a  collection  of 
independent  circuits.  Verify  your  prediction  by  constructing  a  suitable  basis  of  the  cokernel 
of  the  incidence  matrix  and  identifying  each  basis  vector  with  a  circuit. 

G  2.6.5.  (a)  Write  down  the  incidence  matrix  A  for  the  indicated  digraph. 

(b)  What  is  the  rank  of  A?  (c)  Determine  the  dimensions  of  its  four 
fundamental  subspaces,  (d)  Find  a  basis  for  its  kernel  and  cokernel. 

(e)  Determine  explicit  conditions  on  vectors  b  that  guarantee  that  the  system 
Ax  =  b  has  a  solution,  (f)  Write  down  a  specific  nonzero  vector  b  that 
satisfies  your  conditions,  and  then  find  all  possible  solutions. 

0  2.6.6.  (a)  Write  out  the  incidence  matrix  for  the  cubical  digraph  and  identify  the  basis  of  its 
cokernel  with  the  circuits,  (b)  Find  three  circuits  that  do  not  correspond  to  any  of  your 
basis  elements,  and  express  them  as  a  linear  combination  of  the  basis  circuit  vectors. 

G  2.6.7.  Write  out  the  incidence  matrix  for  the  other  Platonic  solids:  (a)  tetrahedron, 

(b)  octahedron,  (c)  dodecahedron,  and  (d)  icosahedron.  (You  will  need  to  choose  an 
orientation  for  the  edges.)  Show  that,  in  each  case,  the  number  of  independent  circuits 
equals  the  number  of  faces  minus  1. 


0  2.6.8.  Prove  that  a  graph  with  n  vertices  and  n  edges  must  have  at  least  one  circuit. 

G  2.6.9.  A  connected  graph  is  called  a  tree  if  it  has  no  circuits,  (a)  Find  the  incidence  matrix  for 
each  of  the  following  directed  trees: 
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(b)  Draw  all  distinct  trees  with  4  vertices.  Assign  a  direction  to  the  edges,  and  write  down 
the  corresponding  incidence  matrices,  (c)  Prove  that  a  connected  graph  on  n  vertices  is  a 
tree  if  and  only  if  it  has  precisely  n  —  1  edges. 


G  2.6.10.  A  complete  graph  Gn  on  n  vertices  has  one  edge  joining  every  distinct  pair  of  vertices, 
(a)  Draw  G3,  G4  and  G5.  (b)  Choose  an  orientation  for  each  edge  and  write  out  the 
resulting  incidence  matrix  of  each  digraph,  (c)  How  many  edges  does  Gn  have?  (d)  How 
many  independent  circuits? 


C  2.6.11.  The  complete  bipartite  digraph  Gm  n  is  based  on  two  disjoint  sets  of,  respectively,  m 
and  n  vertices.  Each  vertex  in  the  first  set  is  connected  to  each  vertex  in  the  second  set 
by  a  single  edge,  (a)  Draw  G2  3,  G2  4,  and  G3  3.  (b)  Write  the  incidence  matrix  of  each 

digraph,  (c)  How  many  edges  does  n  have?  (d)  How  many  independent  circuits? 
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T  2.6.12.  (a)  Construct  the  incidence  matrix  A  for  the  disconnected  digraph 
D  in  the  figure,  (b)  Verify  that  dimker  A  =  3,  which  is  the  same 
as  the  number  of  connected  components,  meaning  the  maximal 
connected  subgraphs  in  D.  (c)  Can  you  assign  an  interpretation 
to  your  basis  for  ker  A?  ( d )  Try  proving  the  general  statement 
that  dim  ker  A  equals  the  number  of  connected  components  in  the 
digraph  D. 


2.6.13.  How  does  altering  the  direction  of  the  edges  of  a  digraph  affect  its  incidence 
matrix?  The  cokernel  of  its  incidence  matrix?  Can  you  realize  this  operation  by 
matrix  multiplication? 


C  2.6.14.  (a)  Explain  why  two  digraphs  are  equivalent  under  relabeling  of  vertices  and 
edges  if  and  only  if  their  incidence  matrices  satisfy  P AQ  =  B,  where  P,  Q  are 
permutation  matrices,  (b)  Decide  which  of  the  following  incidence  matrices  produce 
the  equivalent  digraphs: 
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(c)  How  are  the  cokernels  of  equivalent  incidence  matrices  related? 


2.6.15.  True  or  false:  If  A  and  B  are  incidence  matrices  of  the  same  size  and 
coker  A  =  coker  P,  then  the  corresponding  digraphs  are  equivalent. 

<0  2.6.16.  (a)  Explain  why  the  incidence  matrix  for  a  disconnected  graph  can  be  written  in  block 

diagonal  matrix  form  A  =  under  an  appropriate  labeling  of  the  vertices. 

(b)  Show  how  to  label  the  vertices  of  the  digraph  in  Exercise  2.6.3e  so  that  its  incidence 
matrix  is  in  block  form. 


® 
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Chapter  3 

Inner  Products  and  Norms 


The  geometry  of  Euclidean  space  is  founded  on  the  familiar  properties  of  length  and  angle. 
The  abstract  concept  of  a  norm  on  a  vector  space  formalizes  the  geometrical  notion  of  the 
length  of  a  vector.  In  Euclidean  geometry,  the  angle  between  two  vectors  is  specified  by 
their  dot  product,  which  is  itself  formalized  by  the  abstract  concept  of  an  inner  product. 
Inner  products  and  norms  he  at  the  heart  of  linear  (and  nonlinear)  analysis,  in  both 
finite-dimensional  vector  spaces  and  infinite-dimensional  function  spaces.  A  vector  space 
equipped  with  an  inner  product  and  its  associated  norm  is  known  as  an  inner  product 
space.  It  is  impossible  to  overemphasize  their  importance  for  theoretical  developments, 
practical  applications,  and  the  design  of  numerical  solution  algorithms. 

Mathematical  analysis  relies  on  the  exploitation  of  inequalities.  The  most  fundamental 
is  the  Cauchy-Schwarz  inequality,  which  is  valid  in  every  inner  product  space.  The  more 
familiar  triangle  inequality  for  the  associated  norm  is  then  derived  as  a  simple  consequence. 
Not  every  norm  comes  from  an  inner  product,  and,  in  such  cases,  the  triangle  inequality 
becomes  part  of  the  general  definition.  Both  inequalities  retain  their  validity  in  both  finite¬ 
dimensional  and  infinite-dimensional  vector  spaces.  Indeed,  their  abstract  formulation 
exposes  the  key  ideas  behind  the  proof,  avoiding  all  distracting  particularities  appearing 
in  the  explicit  formulas. 

The  characterization  of  general  inner  products  on  Euclidean  space  will  lead  us  to  the 
noteworthy  class  of  positive  definite  matrices.  Positive  definite  matrices  appear  in  a  wide 
variety  of  applications,  including  minimization,  least  squares,  data  analysis  and  statistics, 
as  well  as,  for  example,  mechanical  systems,  electrical  circuits,  and  the  differential  equa¬ 
tions  describing  both  static  and  dynamical  processes.  The  test  for  positive  definiteness 
relies  on  Gaussian  Elimination,  and  we  can  reinterpret  the  resulting  matrix  factorization 
as  the  algebraic  process  of  completing  the  square  for  the  associated  quadratic  form.  In 
applications,  positive  definite  matrices  most  often  arise  as  Gram  matrices,  whose  entries 
are  formed  by  taking  inner  products  between  selected  elements  of  an  inner  product  space. 

So  far,  we  have  focussed  our  attention  on  real  vector  spaces.  Complex  numbers,  vectors, 
and  functions  also  arise  in  numerous  applications,  and  so,  in  the  final  section,  we  take  the 
opportunity  to  formally  introduce  complex  vector  spaces.  Most  of  the  theory  proceeds  in 
direct  analogy  with  the  real  version,  but  the  notions  of  inner  product  and  norm  on  complex 
vector  spaces  require  some  thought.  Applications  of  complex  vector  spaces  and  their  inner 
products  are  of  particular  significance  in  Fourier  analysis,  signal  processing,  and  partial 
differential  equations,  [61],  and  they  play  an  absolutely  essential  role  in  modern  quantum 
mechanics,  [54]. 

3.1  Inner  Products 

The  most  basic  example  of  an  inner  product  is  the  familiar  dot  product 

n 

vw  =  v1w1+v2w2+  •••  +vnwn  =  'Y^viwi,  (3.1) 

i  =  1 
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Figure  3.1.  The  Euclidean  Norm  in  M2  and  M3. 


T  T 

between  (column)  vectors  v  =  ( iq,  v2, . . . ,  vn  )  ,  w  =  ( uq,  w2, . . . ,  wn  )  ,  both  lying  in  the 
Euclidean  space  Mn.  A  key  observation  is  that  the  dot  product  (3.1)  is  equal  to  the  matrix 

product  z  v 

(w\  \ 


w  =  vTw  =  ( v1 


v< 


Vn) 


w< 


rp  \  J 

between  the  row  vector  v  and  the  column  vector  w.  x  7 

The  dot  product  is  the  cornerstone  of  Euclidean  geometry.  The  key  fact  is  that  the  dot 
product  of  a  vector  with  itself, 

.2 


(3.2) 


V  •  V  =  v\  +  v\  + 


+  V 


n  ’ 


is  the  sum  of  the  squares  of  its  entries,  and  hence,  by  the  classical  Pythagorean  Theorem, 
equals  the  square  of  its  length;  see  Figure  3.1.  Consequently,  the  Euclidean  norm  or  length 
of  a  vector  is  found  by  taking  the  square  root: 


=  V  v  •  V  =  yjv\  +  v\  +  •  •  •  + 


V 


n 


Note  that  every  nonzero  vector,  v^O,  has  positive  Euclidean  norm,  ||  v 
the  zero  vector  has  zero  norm: 


(3.3) 

>  0,  while  only 


v  ||  =  0  if  and  only  if  v  =  0.  The  elementary  properties 
of  dot  product  and  Euclidean  norm  serve  to  inspire  the  abstract  definition  of  more  general 
inner  products. 

Definition  3.1.  An  inner  product  on  the  real  vector  space  V  is  a  pairing  that  takes  two 
vectors  v,w  E  V  and  produces  a  real  number  (v,w)  E  M.  The  inner  product  is  required 
to  satisfy  the  following  three  axioms  for  all  u,  v,  w  E  V,  and  scalars  c,  d  E  M. 

(i)  Bilinearity :  (cu  +  dv,w)  =  c(u,w)  +  d(v,w),  (3  4) 

(  u  ,  c  v  +  d  w )  =  c  (  u  ,  v )  +  d  (  u ,  w ) . 

(u)  Symmetry :  (V5W)  =  (W5V)-  (3-3) 

(Hi)  Positivity :  ( v  ,  v )  >  0  whenever  v^O,  while  (  0 , 0  )  =  0.  (3.6) 


A  vector  space  equipped  with  an  inner  product  is  called  an  inner  product  space.  As  we 
shall  see,  a  vector  space  can  admit  many  different  inner  products.  Verification  of  the  inner 
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product  axioms  for  the  Euclidean  dot  product  is  straightforward,  and  left  as  an  exercise 
for  the  reader. 

Given  an  inner  product,  the  associated  norm  of  a  vector  v  G  his  defined  as  the  positive 
square  root  of  the  inner  product  of  the  vector  with  itself: 


=  \  v  ,  v 


(3.7) 


The  positivity  axiom  implies  that 
only  if  v  =  0  is  the  zero  vector. 


>  0  is  real  and  non-negative,  and  equals  0  if  and 


Example  3.2.  While  certainly  the  most  common  inner  product  on  M2,  the  dot  product 


V  ,  W  )  =  V  •  w  =  v1  w1  +  v2  W2 

is  by  no  means  the  only  possibility.  A  simple  example  is  provided  by  the  weighted  inner 
product  ,  \  /  \ 

(v,w)  =  2v1w1  +5v2w2,  v=(M,  w=hj.  (3.8) 

Let  us  verify  that  this  formula  does  indeed  define  an  inner  product.  The  symmetry  axiom 
(3.5)  is  immediate.  Moreover, 

(cu  +  dv,w)  =  2  (cu1  dv1)w1  +  5  (cu2  +  dv2)  w2 

=  c{2u1w1  +  5 u2  w2)  +  d  (2  v±  w1  +  5  v2  w2)  =  c  (  u  ,  w )  +  d  ( v  ,  w ) , 

which  verifies  the  first  bilinearity  condition;  the  second  follows  by  a  very  similar  computa¬ 
tion.  (Or,  one  can  use  the  symmetry  axiom  to  deduce  the  second  bilinearity  identity  from 
the  first;  see  Exercise  3.1.9.)  Moreover,  (0,0)  =0,  while 

( v  ,  v  )  —  2vl  5v2  >0  whenever  v^O, 

since  at  least  one  of  the  summands  is  strictly  positive.  This  establishes  (3.8)  as  a  legitimate 

inner  product  on  M2.  The  associated  weighted  norm  ||v||  =  \/2v\  +  5 v2  defines  an 

alternative,  “non-Pythagorean”  notion  of  length  of  vectors  and  distance  between  points  in 
the  plane. 

A  less  evident  example  of  an  inner  product  on  M2  is  provided  by  the  expression 


( v  ,  w )  =  v1w1  —  v1  w2  —  v2wx  +4  v2w2. 


(3.9) 


Bilinearity  is  verified  in  the  same  manner  as  before,  and  symmetry  is  immediate.  Positivity 
is  ensured  by  noticing  that  the  expression 

v  ,  v  )  =  v\  —  2u1  v2  +  4^2  =  (v1  —  v2)2  +  3^2  A  0 


is  always  non-negative,  and,  moreover,  is  equal  to  zero  if  and  only  if  v1  —  v2  —  0,  v2  =  0, 
i.e.,  only  when  v1  =  v2  =  0  and  so  v  =  0.  We  conclude  that  (3.9)  defines  yet  another  inner 
product  on  M2,  with  associated  norm 


=  \/  v  ,  v 


=  V 


V- 


2vxv2  +  4  v 


The  second  example  (3.8)  is  a  particular  case  of  a  general  class  of  inner  products. 

Example  3.3.  Let  c1,...,cn  >  0  be  a  set  of  positive  numbers.  The  corresponding 
weighted  inner  product  and  weighted  norm  on  Mn  are  defined  by 
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v-w)  =  E  °iviwi > 


=  vTT 
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(3.10) 


7=1 
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The  numbers  c{  >  0  are  the  weights.  Observe  that  the  larger  the  weight  q,  the  more  the 
ith  coordinate  of  v  contributes  to  the  norm.  Weighted  norms  are  particularly  relevant  in 
statistics  and  data  fitting,  [43,  87],  when  one  wants  to  emphasize  the  importance  of  certain 
measurements  and  de-emphasize  others;  this  is  done  by  assigning  appropriate  weights  to 
the  different  components  of  the  data  vector  v.  Section  5.4,  on  least  squares  approximation 
methods,  will  contain  further  details. 


Exercises 


3.1.1.  Prove  that  the  formula  ( v  ,  w  )  =  v1  w1  —  v1  w2  —  v2  +  bv2  w2  defines  an  inner  product 
on  R2  if  and  only  if  b  >  1. 

3.1.2.  Which  of  the  following  formulas  for  ( v  ,  w)  define  inner  products  on  R  ? 

9  9  2  2 

(a)  2v1w1  +3v2w2,  (b)  v1w2  +  v2w1,  (c)  +  v2)(w1  +  w2),  (d)  v1w1+t>2w2, 

(e) 


'«i  +«!  Vwi  +  w2  >  (0  2i >1w1  +  -v2)(w1  -w2), 

(g)  4:Vi  w1  —  2vl  w2  —  2^2  wl  +  4f2  w2- 

3.1.3.  Show  that  ( v  ,  w  )  =  v1  aq  -\-v1  w2-\-v2  w1  -\-v2  uo2  does  not  define  an  inner  product  on 

Q 

3.1.4.  Prove  that  each  of  the  following  formulas  for  ( v  ,  w )  defines  an  inner  product  on  R  . 
Verify  all  the  inner  product  axioms  in  careful  detail: 

(a)  v i  w1  +  2v2  w2  +  3v3  re3,  (b)  4^  +  2v^  w2  +  2v2  +  4v2  w2  +  r»3  re3, 

(c)  2v1w1  —  2  v1  w2  —  2  v2  w1  -\-?>v2w2  —  v2  —  a3  w2  +  2  v  3  ie3 . 

3.1.5.  The  unit  circle  for  an  inner  product  on  R2  is  defined  as  the  set  of  all  vectors  of  unit 


length: 


=  1.  Graph  the  unit  circles  for  (a)  the  Euclidean  inner  product,  (b)  the 


weighted  inner  product  (3.8),  (c)  the  non-standard  inner  product  (3.9).  (d)  Prove  that 

cases  (b)  and  (c)  are,  in  fact,  both  ellipses. 

0  3.1.6.  (a)  Explain  why  the  formula  for  the  Euclidean  norm  in  R2  follows  from  the  Pythagorean 
Theorem,  (b)  How  do  you  use  the  Pythagorean  Theorem  to  justify  the  formula  for  the 

o 

Euclidean  norm  in  R  ?  Hint :  Look  at  Figure  3.1. 

for  every  scalar 


0  3.1.7.  Prove  that  the  norm  on  an  inner  product  space  satisfies 
c  and  vector  v. 


cv 


2 

3.1.8.  Prove  that  (av  +  6w,cv  +  dw)  =  ac  ||  v  ||  +  (ad  +  6c)(  v  ,  w  )  bd 


w 


0  3.1.9.  Prove  that  the  second  bilinearity  formula  (3.4)  is  a  consequence  of  the  first  and  the  other 
two  inner  product  axioms. 

0  3.1.10.  Let  V  be  an  inner  product  space,  (a)  Prove  that  (x,v)  =  0  for  all  v  £  V  if  and 
only  if  x  =  0.  (b)  Prove  that  ( x  ,  v  )  =  ( y  ,  v  )  for  all  v  £  V  if  and  only  if  x  =  y. 

(c)  Let  v1? . . . ,  vn  be  a  basis  for  V.  Prove  that  ( x  ,  v^ )  =  ( y  ,  v^ ),  i  =  1, . . . ,  n, 
if  and  only  if  x  =  y. 


0  3.1.11.  Prove  that  x  £ 


n  solves  the  linear  system  4x  =  b  if  and  only  if 
x.T  AT v  =  hT  v  for  all  v  £ 


m 


The  latter  is  known  as  the  weak  formulation  of  the  linear  system,  and  its  generalizations  are 
of  great  importance  in  the  study  of  differential  equations  and  numerical  analysis,  [61]. 


0  3.1.12.  (a)  Prove  the  identity 


(U:V>  =  I  (||U  +  V 


u-v||2),  (3.11) 

which  allows  one  to  reconstruct  an  inner  product  from  its  norm,  (b)  Use  (3.11)  to  find  the 
inner  product  on  R  corresponding  to  the  norm  v 


=  Jv\  —  3v1  v2  +  bv[ 
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3.1.13.  (a)  Show  that,  for  all  vectors  x  and  y  in  an  inner  product  space, 


x  +  y|r  + 


X 


=  2 


(  II x ||2  +  ||y  ||2  ) 


(b)  Interpret  this  result  pictorially  for  vectors  in  IR2  under  the  Euclidean  norm. 


3.1.14.  Suppose  u,  v  satisfy  ||  u  ||  =  3,  ||  u  +  v  ||  =  4,  and 


u  —  V 


=  6.  What  must  v  II  equal? 


Does  your  answer  depend  upon  which  norm  is  being  used? 


rri 

3.1.15.  Let  A  be  any  n  x  n  matrix.  Prove  that  the  dot  product  identity  v  •  (A w)  =  (A  v)  •  w 
is  valid  for  all  vectors  v,  w  G 


n 


rp 

0  3.1.16.  Prove  that  A  =  A  is  a  symmetric  n  x  n  matrix  if  and  only  if  (4v)  •  w  =  v  •  (Aw)  for 
all  v,  w  G  Mn. 

m 

3.1.17.  Prove  that  ( A,B )  =  tr(A  B )  defines  an  inner  product  on  the  vector  space  MnXn  of 
real  n  x  n  matrices. 


3.1.18.  Suppose  (v,w)  defines  an  inner  product  on  a  vector  space  V.  Explain  why  it  also 
defines  an  inner  product  on  every  subspace  W  C  V. 

3.1.19.  Prove  that  if  ( v  ,  w )  and  (( v  ,  w ))  are  two  inner  products  on  the  same  vector  space  V, 
then  their  sum  ((( v  ,  w  )))  =  ( v  ,  w  )  +  (( v  ,  w  ))  defines  an  inner  product  on  V. 

0  3.1.20.  Let  V  and  W  be  inner  product  spaces  with  respective  inner  products  (v,v)  and 

((  w  ,  w  )) .  Show  that  ((( (v,  w)  ,  (v,  w) )))  =  ( v  ,  v  )  +  (( w  ,  w  ))  for  v,  v  G  V,  w,  w  G  W, 
defines  an  inner  product  on  their  Cartesian  product  V  x  W. 


Inner  Products  on  Function  Spaces 


Inner  products  and  norms  on  function  spaces  lie  at  the  foundation  of  modern  analysis 
and  its  applications,  particularly  Fourier  analysis,  boundary  value  problems,  ordinary  and 
partial  differential  equations,  and  numerical  analysis.  Let  us  introduce  the  most  important 
examples. 

Example  3.4.  Let  [a,  b]  C  R  be  a  bounded  closed  interval.  Consider  the  vector  space 

C°[a,  b]  consisting  of  all  continuous  scalar  functions  /  defined  on  the  interval  [a,  b].  The 
integral  of  the  product  of  two  continuous  functions, 

( f,g)=[  f(x)g(x)dx,  (3.12) 

J  a 

defines  an  inner  product  on  the  vector  space  C°[a,  6],  as  we  shall  prove  below.  The  asso¬ 
ciated  norm  is,  according  to  the  basic  definition  (3.7), 


/ 11  = 


f(x )2 dx  , 


(3.13) 


and  is  known  as  the  L2  norm  of  the  function  /  over  the  interval  [a,  b].  The  L2  inner 
product  and  norm  of  functions  can  be  viewed  as  the  infinite-dimensional  function  space 
versions  of  the  dot  product  and  Euclidean  norm  of  vectors  in  Mn.  The  reason  for  the  name 
L2  will  become  clearer  later  on. 

For  example,  if  we  take  [a,  b]  =  [0  ,  \  tt  ,  then  the  L2  inner  product  between  f(x)  =  sin  x 
and  g(x)  —  cosx  is  equal  to 


*tt/2  1 

sin  x  cos  x  dx  —  -  sin2  x 

2 


7r/2 


( sin  x  ,  cos  x ) 


0 


x  =  0 


1 

2 
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Similarly,  the  norm  of  the  function  sin  x  is 


sinx  —  \ 

/  r V2 

/  /  (sin  x)2  dx 

V 

Jo 

7 T 

4 


One  must  always  be  careful  when  evaluating  function  norms.  For  example,  the  constant 
function  c(x)  =  1  has  norm 


not  1  as  you  might  have  expected.  We  also  note  that  the  value  of  the  norm  depends  upon 
which  interval  the  integral  is  taken  over.  For  instance,  on  the  longer  interval  [0,  i r], 


Thus,  when  dealing  with  the  L2  inner  product  or  norm,  one  must  always  be  careful  to 
specify  the  function  space,  or,  equivalently,  the  interval  on  which  it  is  being  evaluated. 

Let  us  prove  that  formula  (3.12)  does,  indeed,  define  an  inner  product.  First,  we  need 
to  check  that  (f  ,g)  is  well  defined.  This  follows  because  the  product  f(x)g(x)  of  two 
continuous  functions  is  also  continuous,  and  hence  its  integral  over  a  bounded  interval  is 
defined  and  finite.  The  symmetry  requirement  is  immediate: 

nb  r*b 

( f,g}=  f(x)g(x)dx  =  g(x)  f{x)dx  —  (g ,  f), 

J  a  J  a 

because  multiplication  of  functions  is  commutative.  The  first  bilinearity  axiom 


( cf  +  dg,h)=c{f,h)+d{g,h ) 
amounts  to  the  following  elementary  integral  identity 

nb  pb  f>b 

/  c  f(x)  +  dg(x)  ]  h(x)  dx  —  c  /  f(x)  h(x)  dx  +  d  /  g{x)h{x)dx, 

J  CL  J  cl  J  a 

valid  for  arbitrary  continuous  functions  /,  g,  h  and  scalars  (constants)  c,  d.  The  second 
bilinearity  axiom  is  proved  similarly;  alternatively,  one  can  use  symmetry  to  deduce  it 
from  the  first  as  in  Exercise  3.1.9.  Finally,  positivity  requires  that 

ll/ll2  =  (/,/>=  [  f(x)2  dx  >  0. 

J  a 

This  is  clear  because  f(pc )2  >  0,  and  the  integral  of  a  nonnegative  function  is  nonnegative. 
Moreover,  since  the  function  f(x)2  is  continuous  and  nonnegative,  its  integral  will  vanish, 

b 

f(x)2dx  =  0,  if  and  only  if  f(x)  =  0  is  the  zero  function,  cf.  Exercise  3.1.29.  This 
completes  the  proof  that  (3.12)  defines  a  bona  fide  inner  product  on  the  space  C°[a,  b]. 


Remark.  The  L2  inner  product  formula  can  also  be  applied  to  more  general  functions,  but 
we  have  restricted  our  attention  to  continuous  functions  in  order  to  avoid  certain  technical 
complications.  The  most  general  function  space  admitting  this  inner  product  is  known 
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as  Hilbert  space ,  which  forms  the  basis  of  much  of  modern  analysis,  function  theory,  and 
Fourier  analysis,  as  well  as  providing  the  theoretical  setting  for  all  of  quantum  mechanics, 
[54].  Unfortunately,  we  cannot  provide  the  mathematical  details  of  the  Hilbert  space 
construction,  since  it  requires  that  you  be  familiar  with  measure  theory  and  the  Lebesgue 
integral.  See  [61]  for  a  basic  introduction  and  [19,  68,  77]  for  the  fully  rigorous  theory. 


Warning.  One  needs  to  be  extremely  careful  when  trying  to  extend  the  L2  inner  product 
to  other  spaces  of  functions.  Indeed,  there  are  nonzero  discontinuous  functions  with  zero 
“L2  norm” .  For  example,  the  function 

f  1,  x  =  0, 

fix)  =  <  satisfies 

[  0,  otherwise, 


/II2  =  J  f(x)2dx  =  0,  (3.14) 


because  every  function  that  is  zero  except  at  finitely  many  (or  even  countably  many)  points 
has  zero  integral. 


The  L2  inner  product  is  but  one  of  a  vast  number  of  possible  inner  products  on  function 
spaces.  For  example,  one  can  also  define  weighted  inner  products  on  the  space  C°[a,  b]. 
The  weighting  along  the  interval  is  specified  by  a  (continuous)  positive  scalar  function 
w(x)  >  0.  The  corresponding  weighted  inner  product  and  norm  are 


fh 

/  fb 

f,9)=  /  f(x)g(x)w(x)dx, 

J  a 

11/11=  , 

/  /  f(x )2  w{x)  dx  . 

(3.15) 

The  verification  of  the  inner  product  axioms  in  this  case  is  left  as  an  exercise  for  the  reader. 
As  in  the  finite-dimensional  version,  weighted  inner  products  are  often  used  in  statistics 
and  data  analysis,  [20,  43,  87]. 


Exercises 


3.1.21.  For  each  of  the  given  pairs  of  functions  in  C°[0, 1],  find  their  L2  inner  product 
(f,g)  and  their  L2  norms  ||/||,||g||:  (a)  f(x)  =  1,  g(x)  =  x;  (b)  f(x)  =  cos27tx. 


g(x)  =  sin 27tx;  (c)  /(x)  =  x,  g(x)=ex;  (d)  f(x)  =  (x  +  l)2,  g(x)  = 


x  +  1 


2  '  i  '  i  '  i  2 

3.1.22.  Let  f(x)  =  x,  g(x)  =  1  +  x  .  Compute  ( f,g ),  ||/||,  and  \\g\\  for  (a)  the  L  inner 

product  (f  ,g)  =  /(x)p(x)dx;  (b)  the  L  inner  product  (f,g)  =  J  /(x)p(x)dx; 

(c)  the  weighted  inner  product  (f  ,g)  =  J  /(x)p(x)xdx. 

3.1.23.  Which  of  the  following  formulas  for  (f,g)  define  inner  products  on  the  space 

C°[— 1,1]?  (a)  J  fix)  p(x)  e~ x  dx,  (b)  J  fix)  p(x)  x  dx, 

(c)  f  fix)  gix)  ix  + 2)  dx,  id)  J  fix)  gix)  x2  dx. 

3.1.24.  Prove  that  (/  ,g)  =  fix)  gix)  dx  does  not  define  an  inner  product  on  the  vector 

space  C°  [  —  1 , 1  ] .  Explain  why  this  does  not  contradict  the  fact  that  it  defines  an  inner 
product  on  the  vector  space  C°[0, 1].  Does  it  define  an  inner  product  on  the  subspace 
'pM  C°[  — 1, 1]  consisting  of  all  polynomial  functions? 
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3.1.25.  Does  either  of  the  following  define  an  inner  product  on  C°[0, 1]? 

(a)  if  ,9)  =  f(0)g(0)  +  (b)  (f,g)=f(0)g(0)+/(l)g(l)  +  £  f(x)g(x)dx. 

3.1.26.  Let  f(x)  be  a  function,  and  ||  / 1|  its  L2  norm  on  [a,  b].  Is  ||  /2  ||  =  ||  / ||2?  If  yes,  prove 
the  statement.  If  no,  give  a  counterexample. 

rb  f  /  /  i 

0  3.1.27.  Prove  that  (/,#)=  /  f(x)g(x)  +  /  (x)g  (x)  dx  defines  an  inner  product  on  the 

J  a  L  J 

space  C  1[a:b]  of  continuously  differentiable  functions  on  the  interval  [a,  b].  Write  out  the 
corresponding  norm,  known  as  the  Sobolev  H 1  norm ;  it  and  its  generalizations  play  an 
extremely  important  role  in  advanced  mathematical  analysis,  [49], 

3.1.28.  Let  V  =  C1[  — 1, 1]  denote  the  vector  space  of  continuously  differentiable  functions  for 

—  1  <  x  <  1.  (a)  Does  the  expression  (f,g)  =  J  f'(%)  </(#)  dx  define  an  inner  product  on 

V?  ( b )  Answer  the  same  question  for  the  subspace  W  =  {  /  £  V  |  /(0)  =  0}  consisting  of 
all  continuously  differentiable  functions  that  vanish  at  0. 

0  3.1.29.  (a)  Let  h(pc)  >  0  be  a  continuous,  non-negative  function  defined  on  an  interval  [a,  b\. 

rb  rd, 

Prove  that  /  h(x)  dx  =  0  if  and  only  if  h(x)  =  0.  Hint :  Use  the  fact  that  /  h(pc)  dx  >  0  if 

J  a  J  c 

h(pc)  >  0  for  c  <  x  <  d.  (b)  Give  an  example  that  shows  that  this  result  is  not  valid  if  h  is 
allowed  to  be  discontinuous. 

0  3.1.30.  (a)  Prove  the  inner  product  axioms  for  the  weighted  inner  product  (3.15),  assuming 
w(pc)  >  0  for  all  a  <  x  <  b.  (b)  Explain  why  it  does  not  define  an  inner  product  if  w  is 

continuous  and  w(xq)  <  0  for  some  x0  £  [a,  b].  (c)  If  w(pc)  >  0  for  a  <  x  <  6,  does  (3.15) 
define  an  inner  product?  Hint :  Your  answer  may  depend  upon  w(pc). 

O  n 

G  3.1.31.  Let  C  I  be  a  closed  bounded  subset.  Let  C  (D)  denote  the  vector  space  consisting 
of  all  continuous,  bounded  real-valued  functions  f(x,y)  defined  for  (x,y)  £  D.  (a)  Prove 

that  if  f(x,y)  >  0  is  continuous  and  jj^  f(x,y)dxdy  =  0,  then  f(x,y)  =  0.  Hint :  Mimic 
Exercise  3.1.29.  (b)  Use  this  result  to  prove  that 


if  ,g)  =  j]  f(x,y)g(x,y)dxdy 


(3.16) 


0  2 

defines  an  inner  product  on  C  (D),  called  the  L  inner  product  on  the  domain  D.  What  is 
the  corresponding  norm? 

3.1.32.  Compute  the  L2  inner  product  (3.16)  and  norms  of  the  functions  f(x,y)  =  1  and 

O  Q 

g{x:  y)  =  x  +  y  ,  when  (a)  U  =  {0<x<l,0<7/<l}is  the  unit  square; 

(b)  Q  =  {  x2  +  y2  <  1}  is  the  unit  disk.  Hint:  Use  polar  coordinates. 

C  3.1.33.  Let  V  be  the  vector  space  consisting  of  all  continuous,  vector- valued  functions 
f(x)  =  ( fi(x),  /2 (x)  )T  defined  on  the  interval  0  <  x  <  1. 

(a)  Prove  that  ((f  ,  g))  =  f1(x)g1(x)  +  f2(x)  g2(x)  dx  defines  an  inner  product  on  V. 

( b )  Prove,  more  generally,  that  if  (v,w)  is  any  inner  product  on  IR2,  then 

rb 

((  f  ,  g ))  =  /  (  f(x) ,  g(x) )  dx  defines  an  inner  product  on  V.  (Part  (a)  corresponds  to  the 

J  a 


dot  product.)  (c)  Use  part  (b)  to  prove  that 

rb  r 

«  f  .  S  »  =  /  / 1  i.x)  g1  (x)  -  fl  (x)  g2  (x)  -  f2  {x)  gl  (x)  +  3  f2  (x)  g2  (x) 

J  a  L 

defines  an  inner  product  on  V. 


dx 
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w 


Figure  3.2.  Angle  Between  Two  Vectors. 


3.2  Inequalities 

There  are  two  absolutely  basic  inequalities  that  are  valid  for  any  inner  product  space. 
The  first  is  inspired  by  the  geometric  interpretation  of  the  dot  product  on  Euclidean  space 
in  terms  of  the  angle  between  vectors.  It  is  named  after  two  of  the  founders  of  modern 
analysis,  the  nineteenth-century  mathematicians  Augustin  Cauchy,  of  France,  and  Herman 
Schwarz,  of  Germany,  who  established  it  in  the  case  of  the  L2  inner  product  on  function 
spaced  The  more  familiar  triangle  inequality,  that  the  length  of  any  side  of  a  triangle 
is  bounded  by  the  sum  of  the  lengths  of  the  other  two  sides,  is,  in  fact,  an  immediate 
consequence  of  the  Cauchy-Schwarz  inequality,  and  hence  also  valid  for  any  norm  based 
on  an  inner  product. 

We  will  present  these  two  inequalities  in  their  most  general,  abstract  form,  since  this 
brings  their  essence  into  the  limelight.  Specializing  to  different  inner  products  and  norms 
on  both  finite-dimensional  and  infinite-dimensional  vector  spaces  leads  to  a  wide  variety  of 
striking  and  useful  inequalities. 


The  Cauchy-Schwarz  Inequality 


In  Euclidean  geometry,  the  dot  product  between  two  vectors  v,  w  G  Mn  can  be  geometri¬ 
cally  characterized  by  the  equation 


v  •  w  = 


V 


w 


cos  0, 


(3.17) 


where  9  (v,  w)  measures  the  angle  between  the  two  vectors,  as  illustrated  in  Figure  3.2. 

Since  |  cos  9  \  <  i,  the  absolute  value  of  the  dot  product  is  bounded  by  the  product  of  the 
lengths  of  the  vectors: 


V  •  W  <  V 


w 


This  is  the  simplest  form  of  the  general  Cauchy-Schwarz  inequality.  We  present  a  direct 
algebraic  proof  that  does  not  rely  on  the  geometrical  notions  of  length  and  angle  and  thus 
demonstrates  its  universal  validity  for  any  inner  product. 


Theorem  3.5.  Every  inner  product  satisfies  the  Cauchy-Schwarz  inequality 


(v,w 


< 


w 


for  all 


v,  w  E  V. 


(3.18) 


Here,  ||  v||  is  the  associated  norm,  while  |  •  |  denotes  the  absolute  value  of  a  real  number. 
Equality  holds  in  (3.18)  if  and  only  if  v  and  w  are  parallel  vectors. 


^  Russians  also  give  credit  for  its  discovery  to  their  compatriot  Viktor  Bunyakovsky,  and,  indeed, 
some  authors  append  his  name  to  the  inequality. 
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Proof :  The  case  when  w  =  0  is  trivial,  since  both  sides  of  (3.18)  are  equal  to  0.  Thus,  we 
concentrate  on  the  case  when  w  ^  0.  Let  t  E  R  be  an  arbitrary  scalar.  Using  the  three 
inner  product  axioms,  we  have 


0  <  I  v  +  tw  =  v  +  tw,v  +  tw  = 


V,  V 


+  2t  ( v  ,  w 


+  2t  (  v  ,  w  )  +t 


T  t 
2 


w ,  w 


w 


(3.19) 


with  equality  holding  if  and  only  if  v  =  —  t  w  —  which  requires  v  and  w  to  be  parallel 
vectors.  We  Lx  v  and  w,  and  consider  the  right-hand  side  of  (3.19)  as  a  quadratic  function 
of  the  scalar  variable  t: 


0  <  p(t)  =  at 2  +  2bt  +  c, 


where 


a  = 


w 


b=  (v,w), 


c  = 


To  get  the  maximum  mileage  out  of  the  fact  that  p(t)  >  0,  let  us  look  at  where  it  assumes 
its  minimum,  which  occurs  when  its  derivative  is  zero: 


p  (t)  =  2 at  +  2  b  —  0,  and  so 


a 


v ,  w 


w 


Substituting  this  particular  value  of  t  into  (3.19),  we  obtain 


0  < 


2  _  2  {v,w)  .  (v,w 


w 


+ 


w 


(v,  w 


w 


Rearranging  this  last  inequality,  we  conclude  that 


(v,w 


w 


2  < 


or 


V  ,  w 


< 


w 


Also,  as  noted  above,  equality  holds  if  and  only  if  v  and  w  are  parallel.  Equality  also  holds 
when  w  =  0,  which  is  of  course  parallel  to  every  vector  v.  Taking  the  (positive)  square 
root  of  both  sides  of  the  final  inequality  completes  the  proof  of  (3.18).  Q.E.D. 


Given  any  inner  product,  we  can  use  the  quotient 

a  (v’w 

cos  9  = 


w 


(3.20) 


to  define  the  “angle”  9  =<£(v,w)  between  the  vector  space  elements  v,w  E  V.  The 
Cauchy-Schwarz  inequality  tells  us  that  the  ratio  lies  between  —  1  and  +1,  and  hence  the 
angle  9  is  well  defined  modulo  27r,  and,  in  fact,  unique  if  we  restrict  it  to  he  in  the  range 
0  <  9  <  7i. 

T  T 

For  example,  the  vectors  v  =  (l,0,l)  ,  w  =  ( 0, 1,1)  have  dot  product  v  •  w  =  1  and 
norms  II  v  II  =  II  w 


cos  9  = 


=  y/2.  Hence  the  Euclidean  angle  between  them  is  given  by 

1  1 


V2  •  V2  2  ’ 


and  so 


9  =  (v,  w)  =  \  7T  =  1.0471 . . .  . 


On  the  other  hand,  if  we  adopt  the  weighted  inner  product  ( v  ,  w )  —  v1w1  +2  ^2^2+3  v3  w3 , 


then  v  •  w  =  3, 


cos  9  = 


=  2, 


w 


=  V 5,  and  hence  their  “weighted”  angle  becomes 


2 


=  .67082 


with 


9  =  (v,  w)  =  .83548 


Thus,  the  measurement  of  angle  (and  length)  depends  on  the  choice  of  an  underlying  inner 
product. 
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Similarly,  under  the  L2  inner  product  on  the  interval  [0, 1 
polynomials  p(x)  —  x  and  q(x)  —  x 2  is  given  by 


•  i 

x  dx 


,  the  “angle” 


6  between  the 


so  that  6  (p,q)  =  .25268  . . .  radians. 


Warning.  You  should  not  try  to  give  this  notion  of  angle  between  functions  more  sig¬ 
nificance  than  the  formal  definition  warrants  —  it  does  not  correspond  to  any  “angular” 
properties  of  their  graphs.  Also,  the  value  depends  on  the  choice  of  inner  product  and 
the  interval  upon  which  it  is  being  computed.  For  example,  if  we  change  to  the  L2  inner 


product  on  the  interval  [—1,1],  then  (x,x2)  =  J  x3  dx  =  0.  Hence,  (3.20)  becomes 
cos  6  —  0,  so  the  “angle”  between  x  and  x2  is  now  9  =<£  (p,  q)  — 


Exercises 


3.2.1.  Verify  the  Cauchy-Schwarz  inequality  for  each  of  the  following  pairs  of  vectors  v,w, 
using  the  standard  dot  product,  and  then  determine  the  angle  between  them: 


(a)  (l,2)r,  (-l,2)r,  (b)  (1,-1. 

(d)  ( 1,  — 1,  1j0)T,  (  — 2, 0,  — 1, 1  )T, 


0 )T,  (  —1, 0, 1  )T,  (c)  (1,-1, of,  (2. 

(e)  (  2, 1,  —2,  —1  )T,  (  0,  —1,  2,  —1  )T . 


2,2) 


T 


3.2.2.  (a)  Find  the  Euclidean  angle  between  the  vectors  ( 1, 1, 1, 1  )T  and  ( 1, 1, 1,  —  1  )T  in  IR4. 

T  T 

(b)  List  the  possible  angles  between  ( 1, 1, 1, 1 )  and  ( a1?  a2,  a3,  a4  )  ,  where  each  a i  is 

either  1  or  —1. 


3.2.3.  Prove  that  the  points  (0,  0,  0),  (1, 1,  0),  (1,  0, 1),  (0, 1, 1)  form  the  vertices  of  a  regular 

tetrahedron,  meaning  that  all  sides  have  the  same  length.  What  is  the  common  Euclidean 
angle  between  the  edges?  What  is  the  angle  between  any  two  rays  going  from  the 
center  (  \  )  to  the  vertices?  Remark.  Methane  molecules  assume  this  geometric 

configuration,  and  the  angle  influences  their  chemistry. 

3.2.4.  Verify  the  Cauchy-Schwarz  inequality  for  the  vectors  v  =  (l,2)T,w  =  (l,  —3 )  ,  using 
(a)  the  dot  product;  (b)  the  weighted  inner  product  (v,  w)  =  v  1  w1  +  2t>2  ic2; 

(c)  the  inner  product  (3.9). 


3.2.5.  Verify  the  Cauchy-Schwarz  inequality  for  the  vectors  v  =  ( 3,  —  1,  2  )T  ,  w  =  ( 1,  —  1, 1  )T, 
using  (a)  the  dot  product;  (b)  the  weighted  inner  product  (v,w)  =  v1  w1-\-2v2  ie2+3f3  ie3; 

r-p 

(c)  the  inner  product  ( v  ,  w  )  =  v 


2 

-1 

°\ 

-1 

2 

-1 

w 

0 

-1 

V 

0  3.2.6.  Show  that  one  can  determine  the  angle  0  between  v  and  w  via  the  formula 


cos  0  = 


v  +  w 

2  _ 

v  —  w 

2 

4  | 

v 

w 

.  Draw  a  picture  illustrating  what  is  being  measured. 


0  3.2.7.  The  Law  of  Cosines :  Prove  that  the  formula 


w 


+ 


w 


w 


cos  0. 


(3.21) 


where  0  is  the  angle  between  v  and  w,  is  valid  in  every  inner  product  space. 
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O  Q  o 

3.2.8.  Use  the  Cauchy-Schwarz  inequality  to  prove  (a  cos#  +  6  sin#)  <  a  +  b  for  any  #,  a,  b. 


3.2.9.  Prove  that  (a1  +  a2  +  •  •  •  +  an)2  <  n  (a2  +  a2  +  •  •  •  +  a2 )  for  any  real  numbers 


a 


l’ 


an.  When  does  equality  hold? 


C  3.2.10.  The  cross  product  of  two  vectors  in  R2  is  defined  as  the  scalar 

v  =  (v1,v2)T  ,  w=  (w1,w2) 


V  X  W  =  V^W2 


V2w1 


for 


T 


(3.22) 


(a)  Does  the  cross  product  define  an  inner  product  on  R2?  Carefully  explain  which  axioms 


are  valid  and  which  are  not.  (b)  Prove  that  v  x  w  = 


w  sin#,  where  0  denotes  the 


angle  from  v  to  w  as  in  Figure  3.2.  (c)  Prove  that  v  x  w  =  0  if  and  only  if  v  and  w  are 
parallel  vectors,  (d)  Show  that  |  v  x  w  |  equals  the  area  of  the  parallelogram  defined  by  v 
and  w. 


0  3.2.11.  Explain  why  the  inequality  (v,w)  <  ||v 


w 


obtained  by  omitting  the  absolute 


value  sign  on  the  left-hand  side  of  Cauchy-Schwarz,  is  valid. 

3.2.12.  Verify  the  Cauchy-Schwarz  inequality  for  the  functions  /(x)  =  x  and  g(x)  =  ex  with 
respect  to  (a)  the  L2  inner  product  on  the  interval  [0,1],  (b)  the  L2  inner  product  on 

[— 1,1],  (c)  the  weighted  inner  product  (/,  g  )  =  /  f(x)g(x)e  x  dx. 

J  0 


3.2.13.  Using  the  L2  inner  product  on  the  interval  [0 ,7 r],  find  the  angle  between  the  functions 

(a)  1  and  cosx;  (b)  1  and  sinx;  (c)  cosx  and  sinx. 

3.2.14.  Verify  the  Cauchy-Schwarz  inequality  for  the  two  particular  functions  appearing  in 
Exercise  3.1.32  using  the  L2  inner  product  on  (a)  the  unit  square;  (b)  the  unit  disk. 


Orthogonal  Vectors 

In  Euclidean  geometry,  a  particularly  noteworthy  configuration  occurs  when  two  vectors  are 

1  Q 

perpendicular.  Perpendicular  vectors  meet  at  a  right  angle,  9  —  or  r,  with  cos  9  =  0. 
The  angle  formula  (3.17)  implies  that  the  vectors  v,  w  are  perpendicular  if  and  only  if  their 
dot  product  vanishes:  v  •  w  =  0.  Perpendicularity  is  of  interest  in  general  inner  product 
spaces,  but,  for  historical  reasons,  has  been  given  a  more  suggestive  name. 

Definition  3.6.  Two  elements  v,  w  E  V  of  an  inner  product  space  V  are  called  orthogonal 
if  their  inner  product  vanishes:  ( v  ,  w  )  =  0. 

In  particular,  the  zero  element  is  orthogonal  to  all  other  vectors:  ( 0 ,  v )  =  0  for  all 
v  E  V.  Orthogonality  is  a  remarkably  powerful  tool  that  appears  throughout  the  manifold 
applications  of  linear  algebra,  and  often  serves  to  dramatically  simplify  many  computations. 
We  will  devote  all  of  Chapter  4  to  a  detailed  exploration  of  its  manifold  implications. 

Example  3.7.  The  vectors  v  =  ( 1,  2  )  and  w  =  (  6,  —3  )  are  orthogonal  with  respect 

to  the  Euclidean  dot  product  in  M2,  since  v  •  w  =  1  •  6  +  2  •  (—3)  =  0.  We  deduce  that 
they  meet  at  a  right  angle.  However,  these  vectors  are  not  orthogonal  with  respect  to  the 
weighted  inner  product  (3.8): 

(V,W>  =  (  (2)  ’  (-3)  )  =  2-l-6  +  5-2-(-3)  =  — 18^0. 

Thus,  the  property  of  orthogonality,  like  angles  in  general,  depends  upon  which  inner 
product  is  being  used. 


3.2  Inequalities 
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Example  3.8.  The  polynomials  p(x)  =  x  and  q(x)  =  or  —  ^  are  orthogonal  with  respect 
to  the  inner  product  (p  ,q)  —  /  p(x)  q(x)  dx  on  the  interval  [0,1],  since 

Jo 

\)  =  /  £*  ( x'2  —  7)  )  dx  =  /  ( x3  —  |  x )  dx  =  0. 

Jo  Jo 


X  ,  X  —  2  >  = 


They  fail  to  be  orthogonal  on  most  other  intervals.  For  example,  on  the  interval  [0,  2], 

*2  P2 


X  ,  X  —  f  )  = 


\)  =  /  x(x2  -  \  )  dx  =  /  ( x3  —  |  x )  dx  =  3. 

Jo  Jo 


Warning.  There  is  no  obvious  connection  between  the  orthogonality  of  two  functions  and 
the  geometry  of  their  graphs. 


Exercises 


Note :  Unless  stated  otherwise,  the  inner  product  is  the  standard  dot  product  on  IRn. 

3.2.15.  (a)  Find  a  such  that  (  2,  a,  —3  )T  is  orthogonal  to  (  —1,  3,  —2  )T .  (b)  Is  there  any  value 

of  a  for  which  (  2,  a,  —3  )T  is  parallel  to  (  —1,  3,  —2  )T? 

3.2.16.  Find  all  vectors  in  IR3  that  are  orthogonal  to  both  ( 1,  2,  3  )T  and  (  —2,  0, 1  )T . 

3.2.17.  Answer  Exercises  3.2.15  and  3.2.16  using  the  weighted  inner  product 

( v  ,  w  )  =  3  v i  w1  +  2  r>2  ^2  +  ^3  ^3  • 

3.2.18.  Find  all  vectors  in  IR4  that  are  orthogonal  to  both  ( 1,  2,  3,  4  )T  and  (  5,  6,  7,  8  )T. 

3.2.19.  Determine  a  basis  for  the  subspace  W  C  IR4  consisting  of  all  vectors  which  are 

T 

orthogonal  to  the  vector  ( 1,  2,  —  1,  3  )  . 

o 

3.2.20.  Find  three  vectors  u,v  and  w  in  IR  such  that  u  and  v  are  orthogonal,  u  and  w  are 
orthogonal,  but  v  and  w  are  not  orthogonal.  Are  your  vectors  linearly  independent  or 
linearly  dependent?  Can  you  find  vectors  of  the  opposite  dependency  satisfying  the  same 
conditions?  Why  or  why  not? 

T  T 

3.2.21.  For  what  values  of  a,  b  are  the  vectors  ( 1, 1,  a )  and  (  6,  —  1, 1 )  orthogonal 

(a)  with  respect  to  the  dot  product? 

(b)  with  respect  to  the  weighted  inner  product  of  Exercise  3.2.17? 

3.2.22.  When  is  a  vector  orthogonal  to  itself? 


0  3.2.23.  Prove  that  the  only  element  w  in  an  inner  product  space  V  that  is  orthogonal  to  every 
vector,  so  ( w ,  v )  =0  for  all  v  £  V,  is  the  zero  vector:  w  =  0. 


3.2.24.  A  vector  with 


=  1  is  known  as  a  unit  vector.  Prove  that  if  v,  w  are  both  unit 


vectors,  then  v  +  w  and  v  —  w  are  orthogonal.  Are  they  also  unit  vectors? 


0  3.2.25.  Let  V  be  an  inner  product  space  and  v  £  V.  Prove  that  the  set  of  all  vectors  w  £  V 
that  are  orthogonal  to  v  is  a  subspace  of  V. 

3.2.26.  (a)  Show  that  the  polynomials  Pi(x)  =  1,  P2(x)  =  x  —  2?  Ps(x)  =  x  —  x  +  q 
are  mutually  orthogonal  with  respect  to  the  L  inner  product  on  the  interval  [0, 1]. 

(b)  Show  that  the  functions  sinn7rx,  n  =  1,  2,  3, ...  ,  are  mutually  orthogonal  with  respect 
to  the  same  inner  product. 
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w 


Figure  3.3.  Triangle  Inequality. 


3.2.27.  Find  a  non-zero  quadratic  polynomial  that  is  orthogonal  to  both  Pi(x)  =  1  and 

r\ 

p2(^)  =  x  under  the  L  inner  product  on  the  interval  [—  1, 1]. 

3.2.28.  Find  all  quadratic  polynomials  that  are  orthogonal  to  the  function  ex  with  respect  to 
the  L2  inner  product  on  the  interval  [0, 1]. 

3.2.29.  Determine  all  pairs  among  the  functions  1,  x,  cos7rx,  sin7rx,  ex ,  that  are  orthogonal 
with  respect  to  the  L2  inner  product  on  [  —  1,1]. 


3.2.30.  Find  two  non-zero  functions  that  are  orthogonal  with  respect  to  the  weighted  inner 
product  (f  ,g)  =  J  f(x)g(x)xdx. 


The  Triangle  Inequality 

The  familiar  triangle  inequality  states  that  the  length  of  one  side  of  a  triangle  is  at  most 
equal  to  the  sum  of  the  lengths  of  the  other  two  sides.  Referring  to  Figure  3.3,  if  the 
first  two  sides  are  represented  by  vectors  v  and  w,  then  the  third  corresponds  to  their 
sum  v  +  w.  The  triangle  inequality  turns  out  to  be  an  elementary  consequence  of  the 
Cauchy-Schwarz  inequality  (3.18),  and  hence  is  valid  in  every  inner  product  space. 


Theorem  3.9.  The  norm  associated  with  an  inner  product  satisfies  the  triangle  inequality 


v  +  w||  <  ||  v  ||  +  ||  w  ||  for  all 

Equality  holds  if  and  only  if  v  and  w  are  parallel  vectors. 

Proof :  We  compute 

v  +  w||2  =  (v  +  w,v  +  w) 

< 


v,  w  E  V. 


(3.23) 


V 

2  +  2  (v  ,  w)  + 

w 

2 

V 

2  +  2  |  v  w 

+ 

w 

2  =  ( 

V 

+ 

w 

)2. 


where  the  middle  inequality  follows  from  Cauchy-Schwarz,  cf.  Exercise  3.2.11.  Taking 
square  roots  of  both  sides  and  using  the  fact  that  the  resulting  expressions  are  both  positive 
completes  the  proof.  Q.E.D. 

A  /2\  / 3 

Example  3.10.  The  vectors  v  =  I  2  and  w  =  0  sum  to  v  +  w=  2  I  .  Their 

-i/  W  V2 

Euclidean  norms  are  ||v||  =  \/6  and  ||w||  =  \f 13,  while  ||v  +  w||  =  y/l7.  The  triangle 
inequality  (3.23)  in  this  case  says  vTf  <  \/6  +  yl3,  which  is  true. 


3.2  Inequalities 
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Example  3.11.  Consider  the  functions  f(x)  =  x  —  1  and  g{pc)  =  x2  +  1.  Using  the  U 
norm  on  the  interval  [0,1],  we  find  that 


The  triangle  inequality  requires 


which  is  valid. 


The  Cauchy-Schwarz  and  triangle  inequalities  look  much  more  impressive  when  written 
out  in  full  detail.  For  the  Euclidean  dot  product  (3.1),  they  are 


\ 


n 

52  ViWi 
%  =  1 

S':  1 

n 

52  wi  ’ 

i—  1 

1 

2  ( vi + wif 

n 

l>.2  + 

\ 

n 

52  wi 

=  i 

\ 

i  =  1 

\ 

i  =  1 

(3.24) 


Theorems  3.5  and  3.9  imply  that  these  inequalities  are  valid  for  arbitrary  real  numbers 
v1: . . . ,  vn,  w1: . . . ,  wn.  For  the  L2  inner  product  (3.13)  on  function  space,  they  produce  the 
following  splendid  integral  inequalities: 


,6 

1  rb 

/  fb 

/  f(x)g{x)dx 

<  \ 

/  /  f{x)2dx  1 

/  /  g{x)2  dx 

J  a 

\ 

1  J  a 

V  J a 

f(x)  +  g(x) 


dx 


< 


fix )2  dx  + 


g{x)2  dx 


(3.25) 


which  hold  for  arbitrary  continuous  (and,  in  fact,  rather  general)  functions.  The  first  of 
these  is  the  original  Cauchy-Schwarz  inequality,  whose  proof  appeared  to  be  quite  deep 
when  it  first  appeared.  Only  after  the  abstract  notion  of  an  inner  product  space  was 
properly  formalized  did  its  innate  simplicity  and  generality  become  evident. 


Exercises 

Q 

3.2.31.  Use  the  dot  product  on  M  to  answer  the  following:  (a)  Find  the  angle  between  the 

vectors  ( 1,  2,  3  )T  and  ( 1,  —  1,  2  )T  .  (b)  Verify  the  Cauchy-Schwarz  and  triangle  inequalities 

for  these  two  particular  vectors,  (c)  Find  all  vectors  that  are  orthogonal  to  both  of  these 
vectors. 

3.2.32.  Verify  the  triangle  inequality  for  each  pair  of  vectors  in  Exercise  3.2.1. 

3.2.33.  Verify  the  triangle  inequality  for  the  vectors  and  inner  products  in  Exercise  3.2.4. 

3.2.34.  Verify  the  triangle  inequality  for  the  functions  in  Exercise  3.2.12  for  the  indicated  inner 
products. 
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3.2.35.  Verify  the  triangle  inequality  for  the  two  particular  functions  appearing  in  Exercise 
3.1.32  with  respect  to  the  L2  inner  product  on  (a)  the  unit  square;  (b)  the  unit  disk. 

3.2.36.  Use  the  L2  inner  product  (  f  ,g)  =  J  f(x)  g(x)  dx  to  answer  the  following: 

(a)  Find  the  “angle”  between  the  functions  1  and  x.  Are  they  orthogonal?  (b)  Verify  the 
Cauchy-Schwarz  and  triangle  inequalities  for  these  two  functions,  (c)  Find  all  quadratic 

r\ 

polynomials  p(x)  =  a  +  bx  +  cx  that  are  orthogonal  to  both  of  these  functions. 

3.2.37.  (a)  Write  down  the  explicit  formulae  for  the  Cauchy-Schwarz  and  triangle  inequalities 
based  on  the  weighted  inner  product  (  /  ,  g)  =  J f(x)  g(x)  ex  dx.  (b)  Verify  that  the 

inequalities  hold  when  f(x)  =  1,  g(x)  =  ex  by  direct  computation,  (c)  What  is  the  “angle 
between  these  two  functions  in  this  inner  product? 

3.2.38.  Answer  Exercise  3.2.37  for  the  Sobolev  H1  inner  product 

l 


(/?p)=  /  f(x)d(x)  +  f'(x)d'(x)  dx,  cf.  Exercise  3.1.27. 


r0 


3.2.39.  Prove  that 

3.2.40.  True  or  false: 


V 

-  w 

> 

V 

— 

w 

e: 

w 

< 

V 

+ 

V  +  w 

.  Interpret  this  result  pictorially. 


T  3.2.41.  (a)  Prove  that  the  space  M°°  consisting  of  all  infinite  sequences  x  =  (x1,x2,x3,  . . .  ) 
of  real  numbers  xi  G  K  is  a  vector  space,  (b)  Prove  that  the  set  of  all  sequences  x  such 

2 _ , _ 1 _  _ 1  !  .  ,  _  1  1  /;2 


oo 


that  E  xk  <  oo  is  a  subspace,  commonly  denoted  by  r  C 
k=  1 

,2 


oo 


.  (c)  Write  down  two 

examples  of  sequences  x  belonging  to  tA  and  two  that  do  not  belong  to  £2.  (d)  True  or 
false:  If  x  G  i  ,  then  xk  — >  0  and  k  — >  oo.  (e)  True  or  false:  If  xk  — >  0  as  k  — >  oo,  then 
x  G  .  (f)  Given  a  G  M,  let  x  be  the  sequence  with  xk  =  ak .  For  which  values  of  a  is 

oo 

x  G  ^2?  (g)  Answer  part  (f)  when  xk  =  ka .  ( h )  Prove  that  (x,y )  =  xkVk  defines  an 


k  =  l 


inner  product  on  the  vector  space  £2.  What  is  the  corresponding  norm?  (i)  Write  out  the 
Cauchy-Schwarz  and  triangle  inequalities  for  the  inner  product  space  b2. 


3.3  Norms 


Every  inner  product  gives  rise  to  a  norm  that  can  be  used  to  measure  the  magnitude  or 
length  of  the  elements  of  the  underlying  vector  space.  However,  not  every  norm  that  is 
used  in  analysis  and  applications  arises  from  an  inner  product.  To  define  a  general  norm 
on  a  vector  space,  we  will  extract  those  properties  that  do  not  directly  rely  on  the  inner 
product  structure. 


Definition  3.12.  A  norm  on  a  vector  space  V  assigns  a  non-negative  real  number 
to  each  vector  v  G  V,  subject  to  the  following  axioms,  valid  for  every  v,  w  G  V  and  cGM: 


(i)  Positivity :  ||v||>0, 

{ii)  Homogeneity :  ||cv 

(Hi)  Triangle  inequality : 


with 


=  0  if  and  only  if  v  =  0. 


c 


v  +  w  < 


+ 


w 


As  we  now  know,  every  inner  product  gives  rise  to  a  norm.  Indeed,  positivity  of  the 
norm  is  one  of  the  inner  product  axioms.  The  homogeneity  property  follows  since 


c  v 


c  v  ,  c  v 


)  =  W  (v,v)  = 


V,  V 


3.3  Norms 
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Finally,  the  triangle  inequality  for  an  inner  product  norm  was  established  in  Theorem  3.9. 
Let  us  introduce  some  of  the  principal  examples  of  norms  that  do  not  come  from  inner 
products. 

T 

First,  let  V  =  Mn.  The  1  norm  of  a  vector  v  =  ( v1,  v2, . . . ,  vn  )  is  defined  as  the  sum 
of  the  absolute  values  of  its  entries: 


v. 


+ 


Vc 


+ 


+ 


V 


n 


(3.26) 


The  max  or  oo  norm  is  equal  to  its  maximal  entry  (in  absolute  value) 


OO 


=  max  {  |  v1 


v< 


Vn\  }■ 


(3.27) 


Verification  of  the  positivity  and  homogeneity  properties  for  these  two  norms  is  straight¬ 
forward;  the  triangle  inequality  is  a  direct  consequence  of  the  elementary  inequality 


a  +  b  <  a  +  b 


a,  6  G  M., 


for  absolute  values. 

The  Euclidean  norm,  1  norm,  and  oo  norm  on  Mn  are  just  three  representatives  of  the 
general  p  norm 


v 


(3.28) 


This  quantity  defines  a  norm  for  all  1  <  p  <  oo.  The  oo  norm  is  a  limiting  case  of  (3.28)  as 
p  oo.  Note  that  the  Euclidean  norm  (3.3)  is  the  2  norm,  and  is  often  designated  as  such; 
it  is  the  only  p  norm  which  comes  from  an  inner  product.  The  positivity  and  homogeneity 
properties  of  the  p  norm  are  not  hard  to  establish.  The  triangle  inequality,  however,  is  not 
trivial;  in  detail,  it  reads 


p 


n 

n 

V  i  vi+wip  ^ 

P 

\ 

Ek'+  ", 

i  =  1 

*  =  1  \ 

n 


E 

i  —  1 


Wi 


V 


(3.29) 


and  is  known  as  Minkowski' 's  inequality.  A  complete  proof  can  be  found  in  [50]. 

There  are  analogous  norms  on  the  space  C°[a,  b]  of  continuous  functions  on  an  interval 
a,  b].  Basically,  one  replaces  the  previous  sums  by  integrals.  Thus,  the  Lp  norm  is  defined 
as 


(3.30) 


In  particular,  the  L1  norm  is  given  by  integrating  the  absolute  value  of  the  function: 


f(x)  |  dx 


(3.31) 


The  L2  norm  (3.13)  appears  as  a  special  case,  p  =  2,  and,  again,  is  the  only  one  arising 
from  an  inner  product.  The  limiting  L°°  norm  is  defined  by  the  maximum 


||  /  Hqq  =  max  {  |  f(x)  \  :  a  <  x  <  b  }  .  (3.32) 

Positivity  of  the  Lp  norms  again  relies  on  the  fact  that  the  only  continuous  non-negative 
function  with  zero  integral  is  the  zero  function.  Homogeneity  is  easily  established.  On  the 
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w 


Figure  3.4.  Distance  Between  Vectors. 


other  hand,  the  proof  of  the  general  triangle,  or  Minkowski,  inequality  for  p  ^  1,  2,  oo  is 
again  not  trivial,  [19,  68]. 

Example  3.13.  Consider  the  polynomial  p(x)  =  3x2  —  2  on  the  interval  —1<x<1. 
Its  L2  norm  is 


P 


J]  (3a:2  —  2)2  dx  = 


18 


=  1.8973 


Its  L°°  norm  is 


P 


oo 


=  max 


{13 


x‘ 


1  <  2:  <  1  }  =  2, 


with  the  maximum  occurring  at  x  —  0.  Finally,  its  L1  norm  is 

1 

3x2  —  2  dx 


P 


-l 


-V2/3 


/V 2/3  fl 

(3x2  —  2 )  dx -\-  I  (2  —  3 x2)  dx /  (3x2  —  2 )  dx 

J 


=  Uvl  +  +  =  fVt-  2  =  2-3546---  • 

Every  norm  defines  a  distance  between  vector  space  elements,  namely 


8  2  i  /  4  /  2 


16  /  2 


d(v,  w) 


v  w 


(3.33) 


For  the  standard  dot  product  norm,  we  recover  the  usual  notion  of  distance  between  points 
in  Euclidean  space.  Other  types  of  norms  produce  alternative  (and  sometimes  quite  useful) 
notions  of  distance  that  are,  nevertheless,  subject  to  all  the  familiar  properties: 

(a)  Symmetry :  d(v,  w)  =  d(w,  v); 

(b)  Positivity :  d(v,  w)  =  0  if  and  only  if  v  =  w; 

(c)  Triangle  inequality :  d(v,  w)  <  d(v,  z)  +  d(z,w). 

Just  as  the  distance  between  vectors  measures  how  close  they  are  to  each  other 
keeping  in  mind  that  this  measure  of  proximity  depends  on  the  underlying  choice  of  norm 
so  the  distance  between  functions  in  a  normed  function  space  tells  something  about 
how  close  they  are  to  each  other,  which  is  related,  albeit  subtly,  to  how  close  their  graphs 
are.  Thus,  the  norm  serves  to  define  the  topology  of  the  underlying  vector  space,  which 
determines  notions  of  open  and  closed  sets,  convergence,  and  so  on,  [19,  68]. 
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Exercises 


3.3.1.  Compute  the  1,  2,  3,  and  oo  norms  of  the  vectors 


0 


inequality  in  each  case. 


3.3.2.  Answer  Exercise  3.3.1  for  (a) 


2 

-1 


1 

-2 


(b) 


0 


1\ 

0 

V-i/ 


.  Verify  the  triangle 


/  — 1  \ 
1 
0 


>  (c) 


1\ 

-2 

v-i/ 


3.3.3.  Which  two  of  the  vectors  u  =  (  —2,  2, 1  )T  ,  v  =  ( 1, 4, 1  )T  ,  w  =  (  0,  0,  —  1  )T  are  closest 
to  each  other  in  distance  for  (a)  the  Euclidean  norm?  (b)  the  oo  norm?  (c)  the  1  norm? 

3.3.4.  (a)  Compute  the  L°°  norm  on  [0, 1]  of  the  functions  f(pc)  =  ^  —  x  and  g(pc)  =  x  —  x2 . 

(b)  Verify  the  triangle  inequality  for  these  two  particular  functions. 

3.3.5.  Answer  Exercise  3.3.4  using  the  L1  norm. 

3.3.6.  Which  two  of  the  functions  f(x)  =  1,  g(pc)  =  x,  h(x)  =  sin  ttx  are  closest  to  each  other 
on  the  interval  [0, 1]  under  (a)  the  L1  norm?  (b)  the  L2  norm?  (c)  the  L°°  norm? 

3.3.7.  Consider  the  functions  f(x)  =  1  and  g(pc)  =  x  —  |  as  elements  of  the  vector  space 
C°[0, 1].  For  each  of  the  following  norms,  compute  ||  /  ||,  ||g||,  ||  /  +  g  ||,  and  verify  the 
triangle  inequality:  (a)  the  L  norm;  (b)  the  L  norm;  (c)  the  \y  norm;  (d)  the  L°°  norm. 

3.3.8.  Answer  Exercise  3.3.7  when  f(pc)  =  ex  and  g(pc)  =  e 


—  x 


3.3.9.  Carefully  prove  that  ||  (x,y) 


T 


x 


r\ 

+  2  x  —  y  defines  a  norm  on  R  . 

3.3.10.  Prove  that  the  following  formulas  define  norms  on  R2:  (a)  ||v||  =  \j2v\  +  3^ 

(b) 


(e) 


V 

V 


=  \/2v?  —  v-i  v0  +  2v2 


=  max- 


4  2 

Vr 


2  ’ 


(C) 


=  2 


u- 


+ 


V 


2  h 


(d) 


[\vl-v2\,\vl+v2\)i  (f) 


3.3.11.  Which  of  the  following  formulas  define  norms  on  R^?  (a) 

(b) 


“  v2  I  +  I  V1  +  v 

3 


=  max 
2 


l2 


V- 


Vc 


} 


=  A/2uf  +  Uo  +  3u 


(d) 


v 

V 


-■A2 

v 


(  +  2v1v2+v$  +  v$  ,  (c) 


max 


{4i 


v< 


v. 


+  u. 


A  it 


u- 


,  (e) 


|  +  maxj  1 1>2  |,  |  v3  |  }. 


3.3.12.  Prove  that  two  parallel  vectors  v  and  w  have  the  same  norm  if  and  only  if  v  =  =bw. 


3.3.13.  True  or  false:  If||v  +  w 


+ 


w 


then  v,  w  are  parallel  vectors. 


3.3.14.  Prove  that  the  oo  norm  on  R  does  not  come  from  an  inner  product.  Hint :  Look  at 
Exercise  3.1.13. 


3.3.15.  Can  formula  (3.11)  be  used  to  define  an  inner  product  for  (a)  the  1  norm 
(b)  the  oo  norm  ||v||  on  R2? 

for  all  v  G  R2. 


on 


'? 


0  3.3.16.  Prove  that  lim 


p  — >  oo 


v 


oo 


0  3.3.17.  Justify  the  triangle  inequality  for  (a)  the  L1  norm  (3.31);  (b)  the  L°°  norm  (3.32). 
0  3.3.18.  Let  w(pc)  >  0  for  a  <  x  <  b  be  a  weight  function,  (a)  Prove  that 

rb  q 

||  /  ||i  =  /  |  f(pc)  |  w(pc)  dx  defines  a  norm  on  C  [a,  6],  called  the  weighted  L  norm. 

J  a 

(b)  Do  the  same  for  the  weighted  L°°  norm  ||  f  w  =  maxj  |  f(x)  \  w(pc)  :  a  <  x  <  b  j. 
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3.3.19. 

Let 

•  \h  and 

• 

2 

V 

=  maxj  v 

1  j 

V 

2  be  two  different  norms  on  a  vector  space  V.  (a)  Prove  that 


}  defines  a  norm  on  V.  ( b )  Does  ||v||  =  minj  ||v 


a  norm?  (c)  Does  the  arithmetic  mean 


1 

2 


+ 


v 

v 


l> 


}  define 


define  a  norm? 


(d)  Does  the  geometric  mean 


define  a  norm? 


Unit  Vectors 


Let  V  be  a  normed  vector  space.  The  elements  uGf  that  have  unit  norm,  ||  u  ||  =  1,  play 
a  special  role,  and  are  known  as  unit  vectors  (or  functions  or  elements).  The  following  easy 
lemma  shows  how  to  construct  a  unit  vector  pointing  in  the  same  direction  as  any  given 
nonzero  vector. 


Lemma  3.14.  If  v  ^  0  is  any  nonzero  vector,  then  the  vector  u  =  v, 
dividing  v  by  its  norm  is  a  unit  vector  parallel  to  v. 


obtained  by 


Proof :  We  compute,  making  use  of  the  homogeneity  property  of  the  norm  and  the  fact 
that  ||  v  ||  is  a  scalar, 


u 


=  1 


Q.E.D 


Example  3.15.  The  vector  v  =  ( 1,  —  2  )T  has  length  ||v||2  =  \/5  with  respect  to  the 
standard  Euclidean  norm.  Therefore,  the  unit  vector  pointing  in  the  same  direction  is 


1 


u  — 


1 


On  the  other  hand,  for  the  1  norm, 


\/E  V  — 2 

i  =  3.  and  so 


u  = 


1 

3 


1 

-2 


is  the  unit  vector  parallel  to  v  in  the  1  norm.  Finally, 
sponding  unit  vector  for  the  oo  norm  is 


oo 


=  2,  and  hence  the  corre- 


u  = 


(X) 


1 

2 


1 

-2 


Thus,  the  notion  of  unit  vector  will  depend  upon  which  norm  is  being  used. 

Example  3.16.  Similarly,  on  the  interval  [0,1],  the  quadratic  polynomial  p(x)  =  x2  —  \ 
has  L2  norm 


P 


J J ( x2  —  7}  )2  dx  =  Wy ( a;4  —  x2  +  | )  dx 


Therefore.  u(x)  —  }  =  a/^t  x2 


P 


7 


\/y  is  a  “unit  polynomial”,  ||u||2  =  1,  which  is 


“parallel”  to  (or,  more  precisely,  a  scalar  multiple  of)  the  polynomial  p.  On  the  other 
hand,  for  the  L°°  norm, 


V 


oo  =  max  { h2  -  \ 


0  <  .x  <  1  }  =  ^  , 
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Figure  3.5.  Unit  Balls  and  Spheres  for  1,  2,  and  oo  Norms  in  M 


and  hence,  in  this  case,  u(x)  =  2 p(x)  =  2x2  —  1  is  the  corresponding  unit  polynomial. 
The  unit  sphere  for  the  given  norm  is  defined  as  the  set  of  all  unit  vectors 


si  =  { iiuii  = 1  }> 


while 


Sr  =  {  ||  u  ||  =  r  } 


(3.34) 


is  the  sphere  of  radius  r  >  0.  Thus,  the  unit  sphere  for  the  Euclidean  norm  on  Mn  is  the 
usual  round  sphere 


S!  =  { 


X 


2  2,2, 

—  X i  X<2  + 


+  Xn  ~  1  }• 


The  unit  sphere  for  the  oo  norm  is  the  surface  of  a  unit  cube: 

<1,  i  =  1, . . . ,  n,  and  either 


S!  = 


r 

X- 

xel" 

1 

l 

x1 

or  xn  =  ±  1 


For  the  1  norm, 


S±  —  {  x  E  R 


n 


X 


+ 


+ 


+ 


X 


n 


-1} 


is  the  unit  diamond  in  two  dimensions,  unit  octahedron  in  three  dimensions,  and  unit  cross 
polytope  in  general.  See  Figure  3.5  for  the  two-dimensional  pictures. 

In  all  cases,  the  unit  ball  Bx  =  {  ||  u  ||  <  1  }  consists  of  all  vectors  of  norm  less  than  or 
equal  to  1,  and  has  the  unit  sphere  as  its  boundary.  If  V  is  a  finite-dimensional  normed 
vector  space,  then  the  unit  ball  Bx  is  a  compact  subset,  meaning  that  it  is  closed  and 
bounded.  This  basic  topological  fact,  which  is  not  true  in  infinite-dimensional  normed 
spaces,  underscores  the  distinction  between  finite-dimensional  vector  analysis  and  the  vastly 
more  complicated  infinite-dimensional  realm. 


Exercises 


3.3.20.  Find  a  unit  vector  in  the  same  direction  as  v  =  ( 1,  2,  —3  )T  for  (a)  the  Euclidean  norm, 

(b)  the  weighted  norm  ||  v  1 1 2  =  2v\  +  r\  +  ^  (c)  the  1  norm,  (d)  the  oo  norm,  (e)  the 

norm  based  on  the  inner  product  2v1w1  —  v1  w2  —  V2wi  J[~^V2W2  ~  V2W3  ~  V3W2  +  2  v3  VO3 . 

3.3.21.  Show  that,  for  every  choice  of  given  angles  9  ,  (/>,  and  -0,  the  following  are  unit  vectors 

T 

in  the  Euclidean  norm:  (a)  ( cos  0  cos  0,  cos  0  sin  0,  sin  0  )  .  (b) 

(c)  ( cos  6  cos  (j)  cos  -0,  cos  0  cos  4>  sin  -0,  cos  6  sin  0,  sin  6 )  . 


1  J 

( cos  01  sin  01  cos  0,  sin  0 )  , 


3.3.22.  How  many  unit  vectors  are  parallel  to  a  given  vector  v  /  0?  (a)  1,  (b)  2,  (c)  3, 

(d)  00,  (e)  depends  on  the  norm.  Explain  your  answer. 

3.3.23.  Plot  the  unit  circle  (sphere)  for  (a)  the  weighted  norm 


=  \/V?  +  4:V 


2  . 

2  ’ 


(b)  the  norm  based  on  the  inner  product  (3.9);  (c)  the  norm  of  Exercise  3.3.9. 
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3.3.24.  Draw  the  unit  circle  for  each  norm  in  Exercise  3.3.10. 


3.3.25.  Sketch  the  unit  sphere  S1  C 

9 

weighted  norm 


for 


o  2  i  2  i  o  2 

=  2v  i  +  v0  +  Sv 


l 


3’ 


(a)  the  L1  norm,  (b)  the  L 


(d) 


=  max|  |  +  v2 


CX) 


V- 


norm, 

+  ^3  I’ 


(c)  the 

V2  +  V3\}. 


3.3.26.  Let  v  /  0  be  any  nonzero  vector  in  a  normed  vector  space  V.  Show  how  to  construct  a 
new  norm  on  V  that  changes  v  into  a  unit  vector. 


3.3.27.  True  or  false:  Two  norms  on  a  vector  space  have  the  same  unit  sphere  if  and  only  if 
they  are  the  same  norm. 

3.3.28.  Find  the  unit  function  that  is  a  constant  multiple  of  the  function  f(x)  =  x  —  ^  with 
respect  to  the  (a)  L1  norm  on  [0, 1];  (b)  L2  norm  on  [0, 1];  (c)  L°°  norm  on  [0, 1];  (d)  L1 
norm  on  [  —  1,1];  (e)  L2  norm  on  [—1,1];  (f )  L°°  norm  on  [  —  1,1]. 

3.3.29.  For  which  norms  is  the  constant  function  f(x)  =  1  a  unit  function? 

(a)  L1  norm  on  [0, 1];  (b)  L2  norm  on  [0, 1];  (c)  L°°  norm  on  [0,  1]; 

(d)  L1  norm  on  [—  1, 1];  (e)  L2  norm  on  [—  1, 1];  (f)  L°°  norm  on  [  —  1, 1]; 

(g)  L1  norm  on  R;  (L)  L2  norm  on  R;  (i)  L°°  norm  on  R. 

9  3.3.30.  A  subset  S  C  Rn  is  called  convex  if,  for  all  x,  y  £  S',  the  line  segment  joining  x  to  y 
is  also  in  5,  i.e.,  £x  +  (1  —  t)y  E  S  for  all  0  <  t  <  1.  Prove  that  the  unit  ball  is  a  convex 
subset  of  a  normed  vector  space.  Is  the  unit  sphere  convex? 


Equivalence  of  Norms 

While  there  are  many  different  types  of  norms,  in  a  finite-dimensional  vector  space  they 
are  all  more  or  less  equivalent.  “Equivalence”  does  not  mean  that  they  assume  the  same 
values,  but  rather  that  they  are,  in  a  certain  sense,  always  close  to  one  another,  and  so, 
for  many  analytical  purposes,  may  be  used  interchangeably.  As  a  consequence,  we  may  be 
able  to  simplify  the  analysis  of  a  problem  by  choosing  a  suitably  adapted  norm;  examples 
can  be  found  in  Chapter  9. 


Theorem  3.17.  Let 


and 


constants  0  <  c*  <  C*  such  that 


2  be  any  two  norms  on  Mn.  Then  there  exist  positive 


< 


2<C* 


for  every  v  E  Mn. 


(3.35) 


Proof :  We  just  sketch  the  basic  idea,  leaving  the  details  to  a  more  rigorous  real  analysis 
course,  cf.  [19;  §7.6].  We  begin  by  noting  that  a  norm  defines  a  continuous  real- valued 
function  /(v)  =  ||  v  ||  on  Mn.  (Continuity  is,  in  fact,  a  consequence  of  the  triangle  inequal¬ 
ity.)  Let  Sf1  =  {  ||  u  || =  1  }  denote  the  unit  sphere  of  the  first  norm.  Every  continuous 
function  defined  on  a  compact  set  achieves  both  a  maximum  and  a  minimum  value.  Thus, 
restricting  the  second  norm  function  to  the  unit  sphere  S1  of  the  first  norm,  we  can  set 


min  { 


u  e  S1 } , 


max  { 


(3.36) 


Moreover,  0  <  c*  <  C*  <  oo,  with  equality  holding  if  and  only  if  the  norms  are  the  same. 
The  minimum  and  maximum  (3.36)  will  serve  as  the  constants  in  the  desired  inequalities 
(3.35).  Indeed,  by  definition, 


when 


(3.37) 


which  proves  that  (3.35)  is  valid  for  all  unit  vectors  v  =  ugS1.  To  prove  the  inequalities  in 
general,  assume  v^O.  (The  case  v  =  0  is  trivial.)  Lemma  3.14  says  that  u  =  v/||v||1  E  S1 
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is  a  unit  vector  in  the  first  norm:  ||  u  ||  x  =  1.  Moreover,  by  the  homogeneity  property  of  the 
norm,  ||u||2  =  ||  v  ||2/||  v  1^.  Substituting  into  (3.37)  and  clearing  denominators  completes 
the  proof  of  (3.35).  Q.E.D. 


Example  3.18.  Consider  the  Euclidean  norm 


and  the  max  norm 


oo 


on  Mn. 


According  to  (3.36),  the  bounding  constants  are  found  by  minimizing  and  maximizing 

=  1  on  the  (round)  unit  sphere. 


u 


u 


zb  e 


k 


=  C*  =  1.  The  minimal 


=  max{  |  ux  |, . . . ,  |  un  |  }  over  all  unit  vectors 
The  maximal  value  is  achieved  at  the  poles  zbefc,  with 

value  is  attained  at  the  points  ^  zb  -^=  , . . . ,  zb  -^=  ^ ,  whereby  c *  =  .  Therefore, 

1 

_  -\r  <Z  <Z 


n 


2  — 


< 

oo  — 


(3.38) 


We  can  interpret  these  inequalities  as  follows.  Suppose  v  is  a  vector  lying  on  the  unit  sphere 
in  the  Euclidean  norm,  so  ||  v  ||2  =  1.  Then  (3.38)  tells  us  that  its  oo  norm  is  bounded  from 

above  and  below  by  <  ||  v  || ^  <  1.  Therefore,  the  Euclidean  unit  sphere  sits  inside  the 
oo  norm  unit  sphere  and  outside  the  oo  norm  sphere  of  radius  A=  .  Figure  3.6  illustrates 


the  two-dimensional  situation:  the  unit  circle  is  inside  the  unit  square,  and  contains  the 
square  of  size  . 

One  significant  consequence  of  the  equivalence  of  norms  is  that,  in  Mn,  convergence 
is  independent  of  the  norm.  The  following  are  all  equivalent  to  the  standard  notion  of 
convergence  of  a  sequence  u^3\  ...  of  vectors  in  Mn: 

(a)  the  vectors  converge:  u ^  — >  u*: 

(b)  the  individual  coordinates  all  converge:  u[k^  — >  u*  for  i  —  1, . . . ,  n. 

— >  0. 


(c)  the  difference  in  norms  goes  to  zero: 


u 


(*0 


u 


The  last  version,  known  as  convergence  in  norm ,  does  not  depend  on  which  norm  is  chosen. 
Indeed,  the  inequality  (3.35)  implies  that  if  one  norm  goes  to  zero,  so  does  any  other 
norm.  A  consequence  is  that  all  norms  on  Mn  induce  the  same  topology  —  convergence 
of  sequences,  notions  of  open  and  closed  sets,  and  so  on.  None  of  this  is  true  in  infinite¬ 
dimensional  function  space!  A  rigorous  development  of  the  underlying  topological  and 
analytical  properties  of  compactness,  continuity,  and  convergence  is  beyond  the  scope  of 
this  course.  The  motivated  student  is  encouraged  to  consult  a  text  in  real  analysis,  e.g., 
19,  68],  to  find  the  relevant  definitions,  theorems,  and  proofs. 
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Example  3.19.  Consider  the  infinite-dimensional  vector  space  C°[0, 1]  consisting  of  all 
continuous  functions  on  the  interval  [0,1].  The  functions 


fn(X )  = 


1  —  nx, 

0, 


0  <  x  <  -, 

—  —  n  7 

-  <  X  <  1, 

n.  —  —  ? 


have  identical  L°°  norms 


fn  Hoc  =  SUP  {  I  fn(x )  II  0  <  X  <  1  }  =  1 


On  the  other  hand,  their  L2  norm 


f 

J  n 


r1  /  r1/n 

/  fn(x)2(^x~\  /  (1  —  nx)2  dx — 


V3 


72- 


goes  to  zero  as  n  N  oo.  This  example  shows  that  there  is  no  constant  C*  such  that 

ll/IL<o*||/||2 


for  all  /  £  C° [0, 1  ] .  Thus,  the  L°°  and  L2  norms  on  Cu[0, 1  are  not  equivalent  —  there 
exist  functions  that  have  unit  L°°  norm,  but  arbitrarily  small  L2  norm.  Similar  comparative 
results  can  be  established  for  the  other  function  space  norms.  Analysis  and  topology  on 
function  space  is  intimately  linked  to  the  underlying  choice  of  norm. 


Exercises 

3.3.31.  Check  the  validity  of  the  inequalities  (3.38)  for  the  particular  vectors 

(a)  (1,-1)T,  (b)  ( 1,  2,  3  f ,  (c)  (1,1,1, if,  (d)  (1,-1, -2,-1,  if. 


3.3.32.  Find  all  v  £ 


(a) 


such  that 


oo  ’ 


(0 


2  5 


(C) 


oo 


,  0) 


oo 


C2 


2- 


3.3.33.  How  would  you  quantify  the  following  statement:  The  norm  of  a  vector  is  small  if  and 
only  if  all  its  entries  are  small. 


3.3.34.  Can  you  find  an  elementary  proof  of  the  inequalities  ||v 


< 

oo  — 


<  Vn 


oo 


for 


v  £ 


n 


directly  from  the  formulas  for  the  norms? 


3.3.35.  (i)  Show  the  equivalence  of  the  Euclidean  norm  and  the  1  norm  on  Rn  by  proving 
||  v  1 1 2  <  ||  v  ||i  <  y/n  ||  v  || 2 -  (n)  Verify  that  the  vectors  in  Exercise  3.3.31  satisfy  both 


inequalities,  (in)  For  which  vectors  v  £  Rn  is  (a) 


i?  (b) 


vlli  = 


22 


3.3.36.  (i)  Establish  the  equivalence  inequalities  (3.35)  between  the  1  and  oo  norms. 
(ii)  Verify  them  for  the  vectors  in  Exercise  3.3.31. 

(in)  For  which  vectors  v  £  Rn  are  your  inequalities  equality? 


3.3.37.  Let 


denote  the  usual  Euclidean  norm  on  IRn.  Determine  the  constants  in  the  norm 


112 

equivalence  inequalities  c 


< 


2<C* 


for  the  following  norms:  (a)  the  weighted 


norm 


v  ||  =  J 2v i  +  3fo  ,  (b)  the  norm 


=  max 


{ K  + 


«2  I.  I  W1  ~v2 


}• 


be  a  norm  on  R  .  Prove  that  there  is  a  constant  C  >  0  such  that  the  entries  of 

T 

^  T™n  are  all  bounded,  in  absolute  value,  by  |  v-  \  <  C 


3.3.38.  Let  ||  • 

every  v  =  (  v1,  v2, . . . ,  vn  e 


3.3.39.  Prove  that  if  [a,  b]  is  a  bounded  interval  and  /  G  C°[a,6],  then  ||/||2  <  yfb  —  a  ||/| 


OO  ' 
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C  3.3.40.  In  this  exercise,  the  indicated  function  norms  are  taken  over  all  of 

1,  —  n  <x<n: 

0,  otherwise. 

(b)  Explain  why  there  is  no  constant  C  such  that  ||  /  ll2  <c\\f 


(a)  Let  fn{x)  =  I 


Prove  that  ||  fn 


oo 


oo  as  n 


(c)  Let  /„( x)  = 


I’ 

An  n 


=  1,  but  ||  fn 

for  all  functions  /. 

Prove  that  ||  fn  ||2  =  1,  but  ||  /, 


oo. 


oo 


oo 


OO 


0,  otherwise. 

as  n  — >  oo.  Conclude  that  there  is  no  constant  C  such  that  ||  / 
(d)  Construct  similar  examples  that  disprove  the  related  inequalities 


oo 


<C||/ 


2- 


(0  ll/lloo  <  11/ 


i' 


(**)  II / 111  <  c  ||/||2,  {in)  II/II2  <  C'll/lli- 

'v1  3.3.41.  (a)  Prove  that  the  L°°  and  L2  norms  on  the  vector  space  C°[  —  1, 1]  are  not  equivalent. 
Hint :  Look  at  Exercise  3.3.40  for  ideas,  (b)  Can  you  establish  a  bound  in  either  direction, 
i.e.,  ||  /  <  C  ||  /  || 2  or  ||  /  1 1 2  <  C  ||  /  for  all  /  E  C°[  —  1, 1]  for  some  positive  constants 

C,  Cl  (c)  Are  the  L1  and  L°°  norms  equivalent? 

0  3.3.42.  What  does  it  mean  if  the  constants  defined  in  (3.36)  are  equal:  c*  =  C*? 


3.3.43.  Suppose  (v,w)1  and  (v,w)2  are  two  inner  products  on  the  same  vector  space  V.  For 
which  a,/3  E  M  is  the  linear  combination  (v,w)  =  a  ( v ,  w  )1  +  /?(v,w)2  a  legitimate 
inner  product?  Hint :  The  case  a, /3  >  0  is  easy.  However,  some  negative  values  are  also 
permitted,  and  your  task  is  to  decide  which. 


0  3.3.44.  Suppose  • 

n 

• 

2  are  two  norms  on  IRn. 

satisfy  c*|  \A  |  ^ 

< 

A 

2  <  c* 

A 

1  for  any  n 

0  <  c*  <  C° 


Matrix  Norms 

Each  norm  on  Mn  will  naturally  induce  a  norm  on  the  vector  space  A4nXn  of  all  n  x  n 
matrices.  Roughly  speaking,  the  matrix  norm  tells  us  how  much  a  linear  transformation 
stretches  vectors  relative  to  the  given  norm.  Matrix  norms  will  play  an  important  role 
in  Chapters  8  and  9,  particularly  in  our  analysis  of  linear  iterative  systems  and  iterative 
numerical  methods  for  solving  both  linear  and  nonlinear  systems. 

We  work  exclusively  with  real  nxn  matrices  in  this  section,  although  the  results  straight¬ 
forwardly  extend  to  complex  matrices.  We  begin  by  fixing  a  norm  ||  •  ||  on  Mn.  The  norm 
may  or  may  not  come  from  an  inner  product  —  this  is  irrelevant  as  far  as  the  construction 
goes. 


Theorem  3.20.  If 


is  any  norm  on  then  the  quantity 


A  ||  =  max  {  ||  Au 


u 


-1} 


(3.39) 


defines  the  norm  of  an  n  x  n  matrix  A  E  A4„  ,  called  the  associated  natural  matrix  norm. 

/ 1  /\  it  ' 


Proof :  First  note  that  ||  A||  <  00,  since  the  maximum  is  taken  on  a  closed  and  bounded 
subset,  namely  the  unit  sphere  S'1  =  { ||  u||  =  1}  for  the  given  norm.  To  show  that  (3.39) 
defines  a  norm,  we  need  to  verify  the  three  basic  axioms  of  Definition  3.12. 

Non- negativity,  ||  A||  >  0,  is  immediate.  Suppose  ||  A||  =  0.  This  means  that,  for  every 
unit  vector,  ||Au||  =  0,  and  hence  Aw  =  0  whenever  ||u||  =  1.  If  0  ^  v  E  Mn  is  any 
nonzero  vector,  then  u  =  v/r,  where  r  =  ||  v  ||,  is  a  unit  vector,  so 


Aw  =  A(ru)  =  r  Aw  =  0. 


(3.40) 
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Therefore,  Av  —  0  for  every  v  E  Mn,  which  implies  that  A  —  0  is  the  zero  matrix.  This 
serves  to  prove  the  positivity  property:  ||  A  ||  =  0  if  and  only  if  A  =  O. 

As  for  homogeneity,  if  c  E  M  is  any  scalar,  then 


cA  ||  =  max  {  ||  cAu  ||  }  =  max  {|c|||Au||}  =  |c|  max  {  ||  Au  ||  }  = 


A 


Finally,  to  prove  the  triangle  inequality,  we  use  the  fact  that  the  maximum  of  the  sum  of 
quantities  is  bounded  by  the  sum  of  their  individual  maxima.  Therefore,  since  the  norm 
on  Mn  satisfies  the  triangle  inequality, 


A  +  B 


—  max  {  ||  Au  +  B  u  ||  }  <  max  {  ||  Au 
<  max  {  ||  Au  ||  }  +  max  {  ||  B  u  ||  }  = 


+  ||  Bu||  } 
A  +  B  ||. 


Q.E.D. 


The  property  that  distinguishes  a  matrix  norm  from  a  generic  norm  on  the  space  of 
matrices  is  the  fact  that  it  also  obeys  a  very  useful  product  inequality. 

Theorem  3.21.  A  natural  matrix  norm  satisfies 


Av  <  A 


for  all  A  E  A4nXn,  v  E  R 


n 


(3-41) 


Furthermore, 


AB  <  A  B 


for  all  A,  B  E  M 


nXn' 


(3.42) 


Proof :  Note  first  that,  by  definition  ||Au||  <  ||A||  for  all  unit  vectors 
letting  v  =  ru  where  u  is  a  unit  vector  and  r  =  ||  v  || ,  we  have 


u 


=  1.  Then, 


A  v  ||  =  ||  A(r  u) 


=  r 


Au  <  r  A 


A 


proving  the  first  inequality.  To  prove  the  second,  we  apply  the  first,  replacing  v  by  B  u: 


AB 

—  max  {  ||  A  B  u  |  }  =  max  {  ||  A  (B  u)  |  } 

<  max  {  |  A 

B  u  |  }  =  A  |  max  {  |  \B  u  |  }  = 

A 

B 

Q.E.D 

Remark.  In  general,  a  norm  on  the  vector  space  of  n  x  n  matrices  is  called  a  matrix  norm 
if  it  also  satisfies  the  multiplicative  inequality  (3.42).  Most,  but  not  all,  matrix  norms  used 
in  applications  come  from  norms  on  the  underlying  vector  space. 


The  multiplicative  inequality  (3.42)  implies,  in  particular,  that  ||  A2  ||  < 
is  not  necessarily  valid.  More  generally: 


A 


2;  equality 


Proposition  3.22.  If  A  is  a  square  matrix,  then  ||  Af  ||  < 


Let  us  determine  the  explicit  formula  for  the  matrix  norm  induced  by  the  oo  norm 


oo 


=  max  {  |  v1 


,  ...  , 


Vn\ } 


The  corresponding  formula  for  the  1  norm  is  left  as  Exercise  3.3.48.  The  formula  for  the 
Euclidean  matrix  norm  (2  norm)  will  be  deferred  until  Theorem  8.71. 
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Definition  3.23.  The  zth  absolute  row  sum  of  a  matrix  A  is  the  sum  of  the  absolute  values 
of  the  entries  in  the  ith  row: 


n 


si  = 


an 


+  •••  + 


ain 


=  E 

3  =  1 


% 


(3.43) 


Proposition  3.24.  The  oo  matrix  norm  of  a  matrix  A  is  equal  to  its  maximal  absolute 
row  sum: 

A 


OO 


—  maxjs-L, . . . ,  sn}  —  max  <  y^ 


n 

E  i°« 

1  <  i  <  n 

L=i 

J 

(3.44) 


Proof :  Let  s  =  max^, . . . ,  sn}  denote  the  right-hand  side  of  (3.44).  Given  any  v  E  Mn, 
we  compute  the  oo  norm  of  the  image  vector  4v: 


4v 


oo 


=  max 


n 


E 

3  =  1 


aHV3 


n 


<  max  < 

3  =  1 


ai3V3 


n 


<  max  < 

3  =  1 


% 


max  {  I  v j  I  }  —  s 


oo 


In  particular,  by  specializing  to  a  unit  vector, 


oo 


=  1,  we  deduce  that  A 


oo  <  s- 


On  the  other  hand,  suppose  the  maximal  absolute  row  sum  occurs  at  row  i,  so 


n 


si 


=  E 

3  =  1 


%• 


=  5. 


(3.45) 


Let  u  E  Mn  be  the  specific  vector  that  has  the  following  entries:  u-  =  + 1  if  ■  >  0,  while 

,  the  ith  entry  of  A  u  is 

equal  to  the  zth  absolute  row  sum  (3.45).  This  implies  that  ||  A 


% 

OO  >  \\Au\\oo  >  S ■  Q-E.D. 


Example  3.25.  Consider  the  symmetric  matrix  A  = 


l  _  i 

3  I .  Its  two  absolute 
l  l 

3  4 


row  sums  are 


1 

2 


+ 


l 

3 


5 

6  ’ 


1 

3 


+ 


1 

4 


=  T2’  so 


A II oo  =  max { I’  n }  =  §• 


Exercises 


3.3.45.  Compute  the  oo  matrix  norm  of  the  following  matrices. 

/ 


(a) 


H  (b)  (_f  J  I.  (c) 

6  /  V  6  6 


oo 


3.3.46.  Find  a  matrix  A  such  that  ||  A2 

3.3.47.  True  or  false:  If  B  =  S~1AS  are  similar  matrices,  then  \\B 

3.3.48.  (i)  Find  an  explicit  formula  for  the  1  matrix  norm  ||  A 


l- 


/ 


(d) 


1 

3 

1 

3 


oo 


A 


0 

0 


o  A  I 
\  u  3  3 


oo 


(ii)  Compute  the  1  matrix  norm  of  the  matrices  in  Exercise  3.3.45. 


0 


\ 


l 
3 

\) 
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3.3.49.  Prove  directly  from  the  axioms  of  Definition  3.12  that  (3.44)  defines  a  norm  on  the 
space  of  n  x  n  matrices. 


3.3.50.  Let  A  = 


norm 


1  l  \ 

2  j  .  Compute  the  natural  matrix  norm  ||A||  for  (a)  the  weighted  oo 
=  max{  2  |  v1  | ,  3 1  v2  |  } ;  ( b)  the  weighted  1  norm  1 1  v  1 1  =2 


v 


+  3 


v< 


T  3.3.51.  The  Frobenius  norm  of  an  n  x  n  matrix  A  is  defined  as  A 


F 


n 

,  £ 

\  i,j  =  1 


a 


ij 


Prove  that  this  defines  a  matrix  norm  by  checking  the  three  norm  axioms  plus  the 
multiplicative  inequality  (3.42). 


3.3.52.  Explain  why  ||  A 


max  a 


13 


defines  a  norm  on  the  space  of  n  x  n  matrices.  Show  by 


example  that  this  is  not  a  matrix  norm,  i.e.,  (3.42)  is  not  necessarily  valid. 


3.4  Positive  Definite  Matrices 

Let  us  now  return  to  the  study  of  inner  products  and  fix  our  attention  on  the  finite- 
dimensional  situation.  Our  immediate  goal  is  to  determine  the  most  general  inner  prod¬ 
uct  that  can  be  placed  on  the  finite-dimensional  vector  space  The  answer  will  lead 
us  to  the  important  class  of  positive  definite  matrices,  which  appear  in  a  wide  range  of 
applications,  including  minimization  problems,  mechanics,  electrical  circuits,  differential 
equations,  statistics,  and  numerical  methods.  Moreover,  their  infinite-dimensional  coun¬ 
terparts,  positive  definite  linear  operators,  govern  most  boundary  value  problems  arising 
in  continuum  physics  and  engineering. 

T 

Suppose  we  are  given  an  inner  product  ( x ,  y )  between  vectors  x  =  ( x1  x2  . . .  xn ) 

and  y  =  ( Hi  y2  •  •  •  yn )  in  ^n-  Our  goal  is  to  determine  its  explicit  formula.  We  begin 
by  writing  the  vectors  in  terms  of  the  standard  basis  vectors  (2.17): 

n  n 

x  =  x1e1+  ■■■  +xnen  =  ^2  Xiei:  y  =  +  •  •  •  +  ynen  =  ^  y.e..  (3.46) 

1=1  j= 1 

To  evaluate  their  inner  product,  we  will  appeal  to  the  three  basic  axioms.  We  first  employ 
bilinearity  to  expand 

<n  n  \  n 

N  Xiei'  Fj  yjej  )  =  N  Xiyj(ei’ej )• 

i  =  l  j  =  1  /  i,j  =  1 

Therefore, 

n 

(x,y)=  kij  Xi  yi  =  *Tr  y’  (3-47) 

ij  =  1 

where  K  denotes  the  n  x  n  matrix  of  inner  products  of  the  basis  vectors,  with  entries 

Kj  =  (ei,ei ),  i,j  =  l,...,n.  (3.48) 

We  conclude  that  any  inner  product  must  be  expressed  in  the  general  bilinear  form  (3.47). 

The  two  remaining  inner  product  axioms  will  impose  certain  constraints  on  the  inner 
product  matrix  K.  Symmetry  implies  that 

kij  =  (ei’ej)  =  (ej  >ei)  =  kj  v  m  =  1 
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Consequently,  the  inner  product  matrix  K  must  be  symmetric: 

K  =  Kt. 


Conversely,  symmetry  of  K  ensures  symmetry  of  the  bilinear  form: 

( x ,  y )  =  xTXy  =  (xTXy)T  =  y TKTx  =  yTXx  =  ( y  , x }, 

where  the  second  equality  follows  from  the  fact  that  the  quantity  x.T K  y  is  a  scalar,  and 
hence  equals  its  transpose. 

The  final  condition  for  an  inner  product  is  positivity,  which  requires  that 


n 


X 


=  (x,x)=xTXx= 


Kj  xi  xj  >  0 


for  all 


x  €  R 


n 


(3.49) 


i,j  = 1 


with  equality  if  and  only  if  x  =  0.  The  precise  meaning  of  this  positivity  condition  on  the 
matrix  K  is  not  so  immediately  evident,  and  so  will  be  encapsulated  in  a  definition. 


Definition  3.26.  An  nxn  matrix  K  is  called  positive  definite  if  it  is  symmetric,  KT  =  K , 
and  satisfies  the  positivity  condition 

xTifx>0  for  all  O^xG  (3.50) 

We  will  sometimes  write  K  >  0  to  mean  that  if  is  a  positive  definite  matrix. 


Warning.  The  condition  K  >  0  does  not  mean  that  all  the  entries  of  K  are  positive.  There 
are  many  positive  definite  matrices  that  have  some  negative  entries;  see  Example  3.28 
below.  Conversely,  many  symmetric  matrices  with  all  positive  entries  are  not  positive 
definite! 

Remark.  Although  some  authors  allow  non-symmetric  matrices  to  be  designated  as  pos¬ 
itive  definite,  we  will  say  that  a  matrix  is  positive  definite  only  when  it  is  symmetric. 
But,  to  underscore  our  convention  and  remind  the  casual  reader,  we  will  often  include  the 
superfluous  adjective  “symmetric”  when  speaking  of  positive  definite  matrices. 

Our  preliminary  analysis  has  resulted  in  the  following  general  characterization  of  inner 
products  on  a  finite-dimensional  vector  space. 

Theorem  3.27.  Every  inner  product  on  Mn  is  given  by 

(x,y)=xTify  for  x,yel",  (3.51) 

where  if  is  a  symmetric,  positive  definite  nxn  matrix. 

Given  a  symmetric  matrix  AT,  the  homogeneous  quadratic  polynomial 

n 

q(x)=xTKx=  ^  KjXiXj,  (3.52) 

i,j  =  1 

is  known  as  a  quadratic  form t  on  Mn.  The  quadratic  form  is  called  positive  definite  if 

g(x)  >  0  for  all  O^xG  Mn.  (3.53) 

So  the  quadratic  form  (3.52)  is  positive  definite  if  and  only  if  its  coefficient  matrix  K  is. 


t  Exercise  3.4.15  shows  that  the  coefficient  matrix  K  in  any  quadratic  form  can  be  taken  to  be 
symmetric  without  any  loss  of  generality. 
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Example  3.28.  Even  though  the  symmetric  matrix  K 


has  two  negative 


entries,  it  is,  nevertheless,  a  positive  definite  matrix.  Indeed,  the  corresponding  quadratic 
form 

q(x.)  =  x.tK x  =  4x2  —  4x1  x2  +  3x2  =  {2x1  —  x2)2  +  2x\  >0 

is  a  sum  of  two  non- negative  quantities.  Moreover,  g(x)  =  0  if  and  only  if  both  2x1—x2  =  0 
and  x2  =  0,  which  implies  x1  =  0  also.  This  proves  q(x)  >  0  for  all  x  ^  0,  and  hence  K  is 
indeed  a  positive  definite  matrix.  The  corresponding  inner  product  on  M2  is 


x,y)  =  (ii  i2) 


4 

■2 


2 

3 


VJ  )  =  4 x1y1  -  2 x1y2  -  2x2y1+3x2y2. 
y  2 


1  2 
2  1 


On  the  other  hand,  despite  the  fact  that  K  — 
a  positive  definite  matrix.  Indeed,  writing  out 

/  \  r  I  1  Q  Q 

g(x)  =  x  ATx  =  x1  +  4x1  x2  +  x2, 


has  all  positive  entries,  it  is  not 


we  find,  for  instance,  that  q(  1,-1)  =  —  2  <  0,  violating  positivity.  These  two  simple 
examples  should  be  enough  to  convince  the  reader  that  the  problem  of  determining  whether 
a  given  symmetric  matrix  is  positive  definite  is  not  completely  elementary. 


Example  3.29.  By  definition,  a  general  symmetric  2x2  matrix  K 
definite  if  and  only  if  the  associated  quadratic  form  satisfies 


is  positive 


q(x)  =  ax\  +  2 bxxx2  +  cx2  >  0  for  all  x^O. 


(3.54) 


Analytic  geometry  tells  us  that  this  is  the  case  if  and  only  if 

a  >  0,  a  c  —  b2  >  0, 


(3.55) 


i.e.,  the  quadratic  form  has  positive  leading  coefficient  and  positive  determinant  (or  negative 
discriminant).  A  direct  proof  of  this  well-known  fact  will  appear  shortly. 


With  a  little  practice,  it  is  not  difficult  to  read  off  the  coefficient  matrix  K  from  the 
explicit  formula  for  the  quadratic  form  (3.52). 

Example  3.30.  Consider  the  quadratic  form 


q(x ,  y,z)  —  x2  +  Axy  +  6y2  —  2xz  +  9z2 
depending  upon  three  variables.  The  corresponding  coefficient  matrix  is 


K  = 


1  2 
2  6 
1  0 


whereby  q(x,y,z)  =  (x  y  z) 


1  2 
2  6 
1  0 


Note  that  the  squared  terms  in  q  contribute  directly  to  the  diagonal  entries  of  K ,  while 
the  mixed  terms  are  split  in  half  to  give  the  symmetric  off-diagonal  entries.  As  a  challenge, 
the  reader  might  wish  to  try  proving  that  this  particular  matrix  is  positive  definite  by 
establishing  positivity  of  the  quadratic  form:  q(x,  y,z)  >  0  for  all  nonzero  (x,y,  z)  E  M3. 
Later,  we  will  devise  a  simple,  systematic  test  for  positive  definiteness. 


Slightly  more  generally,  a  quadratic  form  and  its  associated  symmetric  coefficient  matrix 
are  called  positive  semi-definite  if 


q(x.)  =  xTiLx  >  0  for  all  x  E  Mn 


5 


(3.56) 
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in  which  case  we  write  K  >  0.  A  positive  semi-definite  matrix  may  have  null  directions , 
meaning  non-zero  vectors  z  such  that  q(z)  =  zTKz  —  0.  Clearly,  every  nonzero  vector 
z  E  ker  K  that  lies  in  the  coefficient  matrix’s  kernel  defines  a  null  direction,  but  there 
may  be  others.  A  positive  definite  matrix  is  not  allowed  to  have  null  directions,  and  so 
ker  K  =  {0}.  Recalling  Proposition  2.42,  we  deduce  that  all  positive  definite  matrices  are 
nonsingular.  The  converse,  however,  is  not  valid;  many  symmetric,  nonsingular  matrices 
fail  to  be  positive  definite. 


Proposition  3.31.  If  a  matrix  is  positive  definite,  then  it  is  nonsingular. 


Example  3.32.  The  matrix  K  = 


1  -1 
-1  1 

definite.  Indeed,  the  associated  quadratic  form 


is  positive  semi-definite,  but  not  positive 


q(x.)  =  x1  Kx  —  x\  —  2x1  x2  +  x2  —  {x1  —  x2)  >  0 
is  a  perfect  square,  and  so  clearly  non- negative.  However,  the  elements  of  ker  K,  namely 

T 

the  scalar  multiples  of  the  vector  (1,1)  ,  define  null  directions:  g(c,  c)  =  0. 

In  a  similar  fashion,  a  quadratic  form  q(x)  =xTkx  and  its  associated  symmetric  matrix 
K  are  called  negative  semi-definite  if  q(x)  <  0  for  all  x  and  negative  definite  if  q(x)  <  0 
for  all  x  7^  0.  A  quadratic  form  is  called  indefinite  if  it  is  neither  positive  nor  negative 
semi-deffilite,  equivalently,  if  there  exist  points  x+  where  g(x+)  >  0  and  points  x_  where 
g(x_)  <  0.  Details  can  be  found  in  the  exercises. 

Only  positive  definite  matrices  define  inner  products.  However,  indefinite  matrices 
play  a  fundamental  role  in  Einstein’s  theory  of  special  relativity,  [55].  In  particular,  the 
quadratic  form  associated  with  the  matrix 


K  = 


(c 

0 

0 

V  o 


o 

i 

0 

0 


0 

0 

1 

0 


°\ 

0 

0 

1/ 


namely  q(x)  —  x7  K  x  -  czt 


2 .2 


X ' 


y 


zl  where 


x  = 


*\ 

X 


V 

\z/ 
(3.57) 

with  c  representing  the  speed  of  light,  is  the  so-called  Minkowski  “metric”  on  relativistic 
space-time  IP: 1 .  The  null  directions  form  the  light  cone;  see  Exercise  3.4.20. 


Exercises 


3.4.1.  Which  of  the  following  2x2  matrices  are  positive  definite? 


In  the  positive  definite  cases,  write  down  the  formula  for  the  associated  inner  product. 


3.4.2.  Let  K  = 


1  2 
2  3 


.  Prove  that  the  associated  quadratic  form  q(x)  =  xTAx  is  indefinite 


by  finding  a  point  x+  where  g(x+)  >  0  and  a  point  x  where  q(x.  )  <  0. 


0  3.4.3.  (a)  Prove  that  a  diagonal  matrix  D  =  diag  (c1?  c2, . . . ,  cn)  is  positive  definite  if  and  only 

if  all  its  diagonal  entries  are  positive:  ci  >  0.  (b)  Write  down  and  identify  the  associated 

inner  product. 


3.4.4.  Write  out  the  Cauchy-Schwarz  and  triangle  inequalities  for  the  inner  product  defined  in 
Example  3.28. 
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0  3.4.5.  (a)  Show  that  every  diagonal  entry  of  a  positive  definite  matrix  must  be  positive. 

(b)  Write  down  a  symmetric  matrix  with  all  positive  diagonal  entries  that  is  not  positive 
definite,  (c)  Find  a  nonzero  matrix  with  one  or  more  zero  diagonal  entries  that  is  positive 
semi-definite. 

3.4.6.  Prove  that  if  K  is  any  positive  definite  matrix,  then  every  positive  scalar  multiple  cK , 
c  >  0,  is  also  positive  definite. 

0  3.4.7.  (a)  Show  that  if  K  and  L  are  positive  definite  matrices,  so  is  K  +  L.  (b)  Give  an 
example  of  two  matrices  that  are  not  positive  definite  whose  sum  is  positive  definite. 

3.4.8.  Find  two  positive  definite  matrices  K  and  L  whose  product  KL  is  not  positive  definite. 

3.4.9.  Write  down  a  nonsingular  symmetric  matrix  that  is  not  positive  or  negative  definite. 

m  -i  rri 

0  3.4.10.  Let  K  be  a  nonsingular  symmetric  matrix,  (a)  Show  that  x  K~  x  =  y  K y,  where 
K y  =  x.  (b)  Prove  that  if  K  is  positive  definite,  then  so  is  K_1 . 

0  3.4.11.  Prove  that  an  n  x  n  symmetric  matrix  K  is  positive  definite  if  and  only  if,  for  every 
0^  v  G  Mn,  the  vectors  v  and  Kw  meet  at  an  acute  Euclidean  angle:  |  9  \  <  U- 

0  3.4.12.  Prove  that  the  inner  product  associated  with  a  positive  definite  quadratic  form  q(x)  is 
given  by  the  polarization  formula  ( x  ,  y )  =  ^  q(x  +  y)  —  q(x)  —  q( y) 

3.4.13.  (a)  Is  it  possible  for  a  quadratic  form  to  be  positive,  g(x_|_)  >  0,  at  only  one  point 
x+  G  Mn?  (b)  Under  what  conditions  is  g(xQ)  =  0  at  only  one  point? 

m  m 

0  3.4.14.  (a)  Let  K  and  L  be  symmetric  n  x  n  matrices.  Prove  that  x  ifx  =  x  Lx  for  all 

x  G  Mn  if  and  only  if  K  =  L.  (b)  Find  an  example  of  two  non-symmetric  matrices  K  /  L 

such  that  xTiLx  =  xTLx  for  all  x  G  Mn. 


rri  _ 

0  3.4.15.  Suppose  qfx.)  =  x  Ax  =  22  ai-xix,  is  a  general  quadratic  form  on  IRn,  whose 


n 


i,3  =  1 


1 3 


J 


T 

coefficient  matrix  A  is  not  necessarily  symmetric.  Prove  that  q(x)  =  x  ifx,  where 

rri  _ 

K  =  2  (A  +  A  )  is  a  symmetric  matrix.  Therefore,  we  do  not  lose  any  generality  by 
restricting  our  discussion  to  quadratic  forms  that  are  constructed  from  symmetric  matrices. 

3.4.16.  (a)  Show  that  a  symmetric  matrix  N  is  negative  definite  if  and  only  if  K  =  —  N  is 
positive  definite,  (b)  Write  down  two  explicit  criteria  that  tell  whether  a  2  x  2  matrix 
a  b 


N  = 


(0 


b  ( 

-1  1 
1  -2 


is  negative  definite,  (c)  Use  your  criteria  to  check  whether 


(li) 


-4  -5 

-5  -6 


(Hi) 


-3 

-1 


1 

2 


are  negative  definite. 


3.4.17.  Show  that  x  = 


1 

1 


is  a  null  direction  for  K  = 


1  -2 
2  3 


but  x  0  ker  K. 


3.4.18.  Explain  why  an  indefinite  quadratic  form  necessarily  has  a  non-zero  null  direction. 

3.4.19.  Let  K  =  KT .  True  or  false:  (a)  If  K  admits  a  null  direction,  then  ker  K  ^  {0}. 

(b)  If  K  has  no  null  directions,  then  K  is  either  positive  or  negative  definite. 

0  3.4.20.  In  special  relativity,  light  rays  in  Minkowski  space-time  IRn  travel  along  the  light  cone 
which,  by  definition,  consists  of  all  null  directions  associated  with  an  indefinite  quadratic 
form  g(x)  =  xTiFx.  Find  and  sketch  a  picture  of  the  light  cone  when  the  coefficient  matrix 


K  is  (a)  (q 


(*>) 


1  2 
2  3 


(c) 


(1 

0 

0\ 

0 

-1 

0 

.  Remark.  In  the  physical 

\o 

0 

-V 

universe,  space-time  is  n  =  4-dimensional,  and  K  is  given  in  (3.57),  [55]. 
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0  3.4.21.  A  function  /(x)  on  IRn  is  called  homogeneous  of  degree  k  if  /(cx)  =  cfc/(x)  for  all 

scalars  c.  (a)  Given  a  £  Mn,  show  that  the  linear  form  ifx)  =  a  •  x  =  a1  +  •  •  •  +  an  xn  is 
homogeneous  of  degree  1.  (b)  Show  that  the  quadratic  form 

n 

?(x)  =  xTft=  £  kijXiXj  is  homogeneous  of  degree  2. 

hj  =  l 

(c)  Find  a  homogeneous  function  of  degree  2  on  IR2  that  is  not  a  quadratic  form. 


Gram  Matrices 


Symmetric  matrices  whose  entries  are  given  by  inner  products  of  elements  of  an  inner 
product  space  will  appear  throughout  this  text.  They  are  named  after  the  nineteenth- 
century  Danish  mathematician  Jprgen  Gram  —  not  the  metric  mass  unit! 


Definition  3.33.  Let  V  be  an  inner  product  space,  and  let  v1? . . . ,  vn 
Gram  matrix 


(VpVj) 

(vi,v2)  ... 

(  V1  -  Vn 

)\ 

(V2>V1> 

(V2>V2)  ••• 

* 

(  V2  -  Vn 

) 

(vn>vl ) 

• 

(v„,v2)  ••• 

/  v  V 

\  n  5  n 

>/ 

G  V.  The  associated 


(3.58) 


is  the  n  x  n  matrix  whose  entries  are  the  inner  products  between  the  selected  vector  space 
elements. 


Symmetry  of  the  inner  product  implies  symmetry  of  the  Gram  matrix: 

Kj  =  ( vi  j  vj  )  =  ( vj  >  v2 )  =  kjii  and  hence  KT  =  K.  (3.59) 

In  fact,  the  most  direct  method  for  producing  positive  definite  and  semi-definite  matrices 
is  through  the  Gram  matrix  construction. 


Theorem  3.34.  All  Gram  matrices  are  positive  semi-definite.  The  Gram  matrix  (3.58)  is 
positive  definite  if  and  only  if  v1? . . . ,  vn  are  linearly  independent. 


Proof :  To  prove  positive  (semi-) definiteness  of  K,  we  need  to  examine  the  associated  quad¬ 
ratic  form 

n 

q(x)  =  *Tk x  =  Yi  xi  xr 

i,3  =  1 

Substituting  the  values  (3.59)  for  the  matrix  entries,  we  obtain 


n 

q(x)=  T  (vi’vj)xixj- 
i,j  = 1 


Bilinearity  of  the  inner  product  on  V  implies  that  we  can  assemble  this  summation  into  a 
single  inner  product 


where  v 


xivi 


lies  in  the  subspace  of  V  spanned  by  the  given  vectors.  This  immediately  proves  that  K  is 
positive  semi-definite. 
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Moreover,  g(x) 
then 


>  0  as  long  as  v  ^  0.  If  v 


i> 


vn  are  linearly  independent, 


v  =  x1  'v1  +  •  •  •  +  xn  vn  =  0  if  and  only  if  x1  —  •  •  •  =  xn  —  0, 

and  hence  q(x.)  =  0  if  and  only  if  x  =  0.  This  implies  that  q(x.)  and  hence  K  are  positive 
definite.  Q.E.D. 


Example  3.35.  Consider  the  vectors  v:  = 

Euclidean  dot  product  on  M3,  the  Gram  matrix  is 

K  = 


For  the  standard 


v2  •  Vi 


V2  •  V2 


6  -3 
3  45 


(3.60) 


K  = 


(3.61) 


Since  v1?v2  are  linearly  independent,  K  >  0.  Positive  definiteness  implies  that 

q(x1,  x2)  =  6x±  —  6x1  x2  +  45x2  >  0  for  all  (x1,x2)^0. 

Indeed,  this  can  be  checked  directly,  by  using  the  criteria  in  (3.55). 

On  the  other  hand,  for  the  weighted  inner  product 

( v  ,  w )  =  3  v1w1  +2  v2w2  +  5  v3  w3 , 

the  corresponding  Gram  matrix  is 

(vi,Vi)  (v1,v2)\  /  16  -21 

(v2>vi)  (v2,v2>1  V  —21  207 

Since  v1?v2  are  still  linearly  independent  (which,  of  course,  does  not  depend  upon  which 
inner  product  is  used),  the  matrix  K  is  also  positive  definite. 

In  the  case  of  the  Euclidean  dot  product,  the  construction  of  the  Gram  matrix  K  can 
be  directly  implemented  as  follows.  Given  column  vectors  v1? . . . ,  vn  E  Mm,  let  us  form 
the  m  x  n  matrix  A  =  ( v:  v2  . . .  vn  ).  In  view  of  the  identification  (3.2)  between  the  dot 
product  and  multiplication  of  row  and  column  vectors,  the  (i,j)  entry  of  K  is  given  as  the 
product 

rr\ 

k-  ■  =  v •  •  v  ■  =  v  v  • 

,VIJ  I  3  I  3 

of  the  zth  row  of  the  transpose  AT  and  the  jth  column  of  A.  In  other  words,  the  Gram 
matrix  can  be  evaluated  as  a  matrix  product: 

K  =  AtA.  (3.62) 

For  the  preceding  Example  3.35, 

1  3 

A  =  I  2  0  I ,  and  so  K  —  ATA 

-  1  6 


1 

3 


2 

0 


6  -3 
3  45 


Theorem  3.34  implies  that  the  Gram  matrix  (3.62)  is  positive  definite  if  and  only  if  the 
columns  of  A  are  linearly  independent  vectors.  This  implies  the  following  result. 


Proposition  3.36.  Given  an  m  x  n  matrix  A,  the  following  are  equivalent: 

(a)  The  n  x  n  Gram  matrix  K  —  ATA  is  positive  definite. 

(b)  A  has  linearly  independent  columns. 

(c)  rank  A  —  n  <  m. 

(d)  ker A  =  {0}. 
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Changing  the  underlying  inner  product  will,  of  course,  change  the  Gram  matrix.  As 
noted  in  Theorem  3.27,  every  inner  product  on  has  the  form 

(v,w)=vTCfw  for  v,w  G  IRm,  (3.63) 

where  C  >  0  is  a  symmetric,  positive  definite  m  x  m  matrix.  Therefore,  given  n  vectors 
v1? . . . ,  vn  E  Mm,  the  entries  of  the  Gram  matrix  with  respect  to  this  inner  product  are 

ka  =  (vi>vj)  =  Tcy.. 

If,  as  above,  we  assemble  the  column  vectors  into  an  m  x  n  matrix  A  =  ( v:  v2  . . .  vn ), 
then  the  Gram  matrix  entry  k-  is  obtained  by  multiplying  the  zth  row  of  AT  by  the  jth 

column  of  the  product  matrix  C  A.  Therefore,  the  Gram  matrix  based  on  the  alternative 
inner  product  (3.63)  is  given  by 

K  =  ATC  A.  (3.64) 

Theorem  3.34  immediately  implies  that  K  is  positive  definite  —  provided  that  the  matrix 
A  has  rank  n. 

Theorem  3.37.  Suppose  A  is  an  m  x  n  matrix  with  linearly  independent  columns.  Sup¬ 
pose  C  is  any  positive  definite  m  x  m  matrix.  Then  the  Gram  matrix  K  —  ATC  A  is  a 
positive  definite  n  x  n  matrix. 

The  Gram  matrices  constructed  in  (3.64)  arise  in  a  wide  variety  of  applications,  including 
least  squares  approximation  theory  (cf.  Chapter  5),  and  mechanical  structures  and  electrical 
circuits  (cf.  Chapters  6  and  10).  In  the  majority  of  applications,  C  =  diag  (c1? . . . ,  cm)  is  a 
diagonal  positive  definite  matrix,  which  requires  it  to  have  strictly  positive  diagonal  entries 
ci  >  0.  This  choice  corresponds  to  a  weighted  inner  product  (3.10)  on  Mm. 


Example  3.38.  Returning  to  the  situation  of  Example  3.35,  the  weighted  inner  product 

/3  0  0 

(3.60)  corresponds  to  the  diagonal  positive  definite  matrix  C  —  0  2  0  .  Therefore, 

\0  0  5 

/  1' 

the  weighted  Gram  matrix  (3.64)  based  on  the  vectors  v:  = 


K  =  ATC  A  = 


1  2 
3  0 


1  3 

2  0 
-1  6 


16  -21 
21  207 


reproducing  (3.61). 

The  Gram  matrix  construction  is  not  restricted  to  finite-dimensional  vector  spaces,  but 
also  applies  to  inner  products  on  function  space.  Here  is  a  particularly  important  example. 

Example  3.39.  Consider  the  vector  space  C°[0, 1]  consisting  of  continuous  functions 
on  the  interval  0  <  x  <  1,  equipped  with  the  L2  inner  product  (f,g)=  /  f(x )  g(x)dx. 

Jo 

Let  us  construct  the  Gram  matrix  corresponding  to  the  simple  monomial  functions  1,  x,  x2. 
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We  compute  the  required  inner  products 


Therefore,  the  Gram  matrix  is 


K  = 


<1,1 

{x  ,  1) 

V<x2,i 


1 ,  x 


tAy  ^  tAy 


(  X  ,  x2  ) 

/  „ 2  „2\ 


( 1 

1 

2 

_ 

1 

1 

1 

2 

3 

4 

1 

1 

1 

As  we  know,  the  monomial  functions  1,  x,  x2  are  linearly  independent,  and  so  Theorem  3.34 
immediately  implies  that  the  matrix  K  is  positive  definite. 

The  alert  reader  may  recognize  this  particular  Gram  matrix  as  the  3x3  Hilbert  matrix 
that  we  encountered  in  (1.72).  More  generally,  the  Gram  matrix  corresponding  to  the 
monomials  1  •j  tXy  j  tJy  j  ^  tXy  n  has  entries 


h  —  ( -ri~1  M-1  \  —  /  j  _ 

r\j ^  ^  tAy  tAy  j  J  tAy  \AjtAy 

Jo 


hj  =  l,...,n  +  l, 


and  is  thus  the  (n  +  1)  x  (n  +  1)  Hilbert  matrix  (1.72):  K  —  Hn+1.  As  a  consequence  of 
Theorem  3.34  and  Proposition  3.31  (and  also  Exercise  2.3.36),  we  have  proved  the  following 
non-trivial  result. 


Proposition  3.40.  The  n  x  n  Hilbert  matrix  Hn  is  positive  definite.  Consequently,  H 
is  a  nonsingular  matrix. 


Example  3.41.  Let  us  construct  the  Gram  matrix  corresponding  to  the  trigonometric 


*7T 


functions  1,  cosx,  sinx,  with  respect  to  the  inner  product  (f,g)=  /  f(x)g(x)dx  on 


the  interval 


—  7T,  7T 


.  We  compute  the  inner  products 


—7 r 


i,i)=  1 


•7 r 


-7 r 


*7 r 


dx  =  27t, 


l,cosx)=  /  cosxdx  =  0 , 


*7T 


(  COS  X  ,  COS  X  )  = 


(  sin  x  ,  sin  x  )  = 


cosx 


smx 


cos2  xdx  = 


-IT 
*7 r 


7T. 


l,sinx)=  /  sinx<ix  =  0, 


—  7T 
*7T 


—  7T 


sin2  xdx  = 


■7 r 


7T. 


cosx,sinx)=  /  cosx  sinx  dx  =  0. 


—  7T 


—  7T 


2tt  0  0 

Therefore,  the  Gram  matrix  is  a  simple  diagonal  matrix:  K  —  0  7r  0 

definiteness  of  iL  is  immediately  evident.  \  0  0  tt 


Positive 


If  the  columns  of  A  are  linearly  dependent,  then  the  associated  Gram  matrix  is  only 
positive  semi-definite.  In  this  case,  the  Gram  matrix  will  have  nontrivial  null  directions  v, 
so  that  0^  vG  keriL  =  ker  A. 


3.4  Positive  Definite  Matrices 
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Proposition  3.42.  Let  K  —  ATC  A  be  the  nxn  Gram  matrix  constructed  from  an  m  x  77- 
matrix  A  and  a  positive  definite  m  x  m  matrix  C  >  0.  Then  ker  AT  =  ker  A,  and  hence 
rankiL  =  rank  A 

Proof :  Clearly,  if  Ax  =  0,  then  Kx  =  ATC Ax  =  0,  and  so  ker  A  C  ker  AT.  Conversely,  if 
A  x  =  0.  then 

0  =  xtKx  =  xtAtCAx  =  yTCy,  where  y  =  4x. 

Since  C  >  0,  this  implies  y  =  0,  and  hence  x  E  ker  A  Finally,  by  Theorem  2.49,  rankiL  = 
n  —  dim  ker  K  =  n  —  dim  ker  A  =  rank  A.  Q.E.D. 


Exercises 


3.4.22.  (a)  Find  the  Gram  matrix  corresponding  to  each  of  the  following  sets  of  vectors  using 


the  Euclidean  dot  product  on 


n 


2  \ 


(in) 


O) 


(0 


0 


1 

3 


0 

2 


(n) 


1 

2 


2 

3 


1 

1 


\ 

/°\ 

(  P 

(  2  \ 

/-i\ 

1 

1  ,  (0 

-2 

1 

-1 1 

-1 

7  \i 7 

V  2  7 

^  17 

^  17 

(m) 


/-i\ 

1\ 

—2  \ 

(~\\ 

fi\ 

—2  \ 

f-l\ 

/ 

°\ 

1 

,  (vii) 

2 

1 

3 

,  (via) 

0 

1 

0 

2 

0 

3 

-4 

1 

-1 

0 

1 

0 

-1 

-3 

V  1/ 

U7 

3J 

D27 

u7 

o7 

V 

o7 

1 

V-1 ) 

(  1\ 

0 
-1 

V  0 ) 

(b)  Which  are  positive  definite?  (c)  If  the  matrix  is  positive  semi-definite,  find  all  its  null 
directions. 

3.4.23.  Recompute  the  Gram  matrices  for  cases  (iii)-(v)  in  the  previous  exercise  using  the 
weighted  inner  product  (x,y )  =  x1y1  2  x2y2  +  3x^y^.  Does  this  change  its  positive 
definiteness? 

3.4.24.  Recompute  the  Gram  matrices  for  cases  (vi)-(viii)  in  Exercise  3.4.22  for  the  weighted 
inner  product  ( x  ,  y  )  =  x1  y1  +  \  x2  y2  +  \x3y3  +  \xAyA. 

3.4.25.  Find  the  Gram  matrix  K  for  the  functions  l,ex,e2x  using  the  L2  inner  product  on 
[0, 1].  Is  K  positive  definite? 

3.4.26.  Answer  Exercise  3.4.25  using  the  weighted  inner  product  (/  ,g)  =  J  f(x)g(x)e  x  dx. 

o  o  o 

3.4.27.  Find  the  Gram  matrix  K  for  the  monomials  l,x,x  ,x  using  the  L  inner  product  on 


-  1, 1].  Is  K  positive  definite? 

3.4.28.  Answer  Exercise  3.4.27  using  the  weighted  inner  product  (f,g)  =  J  f(x)  g(pc)  (1  +  x)  dx. 


3.4.29.  Let  K  be  a  2  x  2  Gram  matrix.  Explain  why  the  positive  definiteness  criterion  (3.55)  is 
equivalent  to  the  Cauchy-Schwarz  inequality. 

A)  3.4.30.  (a)  Prove  that  if  K  is  a  positive  definite  matrix,  then  K 2  is  also  positive  definite. 

(b)  More  generally,  if  S  =  ST  is  symmetric  and  nonsingular,  then  S 2  is  positive  definite. 

rri 

3.4.31.  Let  A  be  an  m  x  n  matrix,  (a)  Explain  why  the  product  L  =  AA  is  a  Gram  matrix. 

'T 

(b)  Show  that,  even  though  they  may  be  of  different  sizes,  both  Gram  matrices  K  =  A  A 

m 

and  L  =  AA  have  the  same  rank,  (c)  Under  what  conditions  are  both  K  and  L  positive 
definite? 
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0  3.4.32.  Let  K  =  ATC  A,  where  C  >  0.  Prove  that 

(a)  ker  K  =  coker  K  =  ker  A;  ( b )  img  K  =  coimg  K  =  coimg  A. 

0  3.4.33.  Prove  that  every  positive  definite  matrix  K  can  be  written  as  a  Gram  matrix. 


3.4.34.  Suppose  K  is  the  Gram  matrix  computed  from  v1? . . . ,  v  E  V  relative  to  a  given  inner 
product.  Let  K  be  the  Gram  matrix  for  the  same  elements,  but  computed  relative  to  a 
different  inner  product.  Show  that  K  >  0  if  and  only  if  K  >0. 

rp  rp 

<0  3.4.35.  Let  K1  =  A\  C1  A1  and  K2  =  A2  C2  A2  be  any  two  n  x  n  Gram  matrices.  Let 

K  =  K1+K2.  (a)  Show  that  if  K1,K2  >  0  then  K  >  0.  (b)  Give  an  example  in  which  K ^ 
and  K2  are  not  positive  definite,  but  K  >  0.  (c)  Show  that  K  is  also  a  Gram  matrix,  by 
finding  a  matrix  A  such  that  K  =  A  C A.  Hint :  A  will  have  size  (m1  +  m2)  x  n,  where  m1 
and  m2  are  the  numbers  of  rows  in  A1,A2,  respectively. 


rp 

3.4.36.  Show  that  0  ^  z  is  a  null  direction  for  the  quadratic  form  q(x)  =  x  Lx  based  on  the 

f p* 

Gram  matrix  K  =  A  C  A  if  and  only  if  z  E  keriL. 


3.5  Completing  the  Square 


Gram  matrices  furnish  us  with  an  almost  inexhaustible  supply  of  positive  definite  matrices. 
However,  we  still  do  not  know  how  to  test  whether  a  given  symmetric  matrix  is  positive 
definite.  As  we  shall  soon  see,  the  secret  already  appears  in  the  particular  computations 
in  Examples  3.2  and  3.28. 

You  may  recall  the  algebraic  technique  known  as  “completing  the  square” ,  first  arising 
in  the  derivation  of  the  formula  for  the  solution  to  the  quadratic  equation 

q(x)  —  ax2  +  2bx  +  c  —  0,  (3.65) 


and,  later,  helping  to  facilitate  the  integration  of  various  types  of  rational  and  algebraic 
functions.  The  idea  is  to  combine  the  first  two  terms  in  (3.65)  as  a  perfect  square,  and 
thereby  rewrite  the  quadratic  function  in  the  form 

2  b 2 


q(x)  —  a 


A 


ac 


=  0. 


a 


(3.66) 


As  a  consequence, 


The  familiar  quadratic  formula 


—  ac 


—  b  d=  Vb 2  —  ac 

x  —  - 

a 

follows  by  taking  the  square  root  of  both  sides  and  then  solving  for  x.  The  intermediate 
step  (3.66),  where  we  eliminate  the  linear  term,  is  known  as  completing  the  square. 

We  can  perform  the  same  kind  of  manipulation  on  a  homogeneous  quadratic  form 

q(x1,  x2)  =  ax\  +  2bxx  x2  +  cx\.  (3.67) 


In  this  case,  provided  a  ^0,  completing  the  square  amounts  to  writing 

2  -b2  o 


/  x  2  2  (  b  V 

qyx^  x2)  =  axx  +  2bx1x2  +  cx2  =  a  I  xx  4 —  x2\  + 


ac 


ac 


-b2 


a 


x2  =  ay1  A 


a 


V2 


(3.68) 
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The  net  result  is  to  re-express  q(x1,x2)  as  a  simpler  sum  of  squares  of  the  new  variables 


Vi  =  x1  +  -  x2,  y2  =  x 2-  (3-69) 

a 

It  is  not  hard  to  see  that  the  final  expression  in  (3.68)  is  positive  definite,  as  a  function  of 
y1,y2,  if  and  only  if  both  coefficients  are  positive: 

a  >  0,  >  o.  (3.70) 

a 

Therefore,  q(x1:x2)  >  0,  with  equality  if  and  only  if  y1  =  y2  =  0,  or,  equivalently,  x1  = 
x2  —  0.  This  conclusively  proves  that  conditions  (3.70)  are  necessary  and  sufficient  for  the 
quadratic  form  (3.67)  to  be  positive  definite. 

Our  goal  is  to  adapt  this  simple  idea  to  analyze  the  positivity  of  quadratic  forms  de¬ 
pending  on  more  than  two  variables.  To  this  end,  let  us  rewrite  the  quadratic  form  identity 
(3.68)  in  matrix  form.  The  original  quadratic  form  (3.67)  is 

fa  b 
\b  c 

Similarly,  the  right-hand  side  of  (3.68)  can  be  written  as 

q{y)  =  yTDy,  where  D  =  (^  qC-  62  ),  y={jjl)'  (3'72) 

Anticipating  the  final  result,  the  equations  (3.69)  connecting  x  and  y  can  themselves  be 
written  in  matrix  form  as 


g(x)=xTiCx,  where  K 


y  —  LT  x. 


or 


b 

+  ax 
x2 


where 


Substituting  into  (3.72),  we  obtain 


y tD  y  =  (LTx.)T D  (LTx)  =  xT  L  D  LTx  =  xTiCx, 


where  K  =  LDLT .  (3.73) 


The  result  is  the  same  factorization  (1.61)  of  the  coefficient  matrix  that  we  previously 
obtained  via  Gaussian  Elimination.  We  are  thus  led  to  the  realization  that  completing  the 
square  is  the  same  as  the  LDLT  factorization  of  a  symmetric  matrixl 

Recall  the  definition  of  a  regular  matrix  as  one  that  can  be  reduced  to  upper  triangular 
form  without  any  row  interchanges.  Theorem  1.34  says  that  the  regular  symmetric  matrices 
are  precisely  those  that  admit  an  LDLT  factorization.  The  identity  (3.73)  is  therefore  valid 
for  all  regular  n  x  n  symmetric  matrices,  and  shows  how  to  write  the  associated  quadratic 
form  as  a  sum  of  squares: 

q(yi)=yLTKyL  =  yTDy  =  dly\+  ■■■  +dnyl,  where  y  =  iTx.  (3.74) 


The  coefficients  d-  are  the  diagonal  entries  of  D ,  which  are  the  pivots  of  K.  Furthermore, 
the  diagonal  quadratic  form  is  positive  definite,  yT Dy  >  0  for  all  y  ^  0,  if  and  only  if 
all  the  pivots  are  positive,  di  >  0.  Invertibility  of  LT  tells  us  that  y  =  0  if  and  only 
if  x  =  0,  and  hence,  positivity  of  the  pivots  is  equivalent  to  positive  definiteness  of  the 
original  quadratic  form:  q(x)  >  0  for  all  x  ^  0.  We  have  thus  almost  proved  the  main 
result  that  completely  characterizes  positive  definite  matrices. 


Theorem  3.43.  A  symmetric  matrix  is  positive  definite  if  and  only  if  it  is  regular  and  has 
all  positive  pivots. 


168 


3  Inner  Products  and  Norms 


Equivalently,  a  square  matrix  K  is  positive  definite  if  and  only  if  it  can  be  factored 
K  =  LDLt ,  where  L  is  lower  unitriangular  and  D  is  diagonal  with  all  positive  diagonal 
entries. 

Example  3.44.  Consider  the  symmetric  matrix  K  = 
ination  produces  the  factors 

/  1  0  0\  /I  0  0\  / 12  -1\ 

L=  2  10  ,  D  =  0  2  0  ,  iT  =  |  0  1  1, 

\-l  11/  \0  0  6/  \0  0  1 / 


1  2  -1 
2  6  0  1.  Gaussian  Elim- 


1  0 


9 


in  its  factorization  K  =  LDLT.  Since  the  pivots  —  the  diagonal  entries  1,2,  and  6  in  D 
are  all  positive,  Theorem  3.43  implies  that  K  is  positive  definite,  which  means  that  the 
associated  quadratic  form  satisfies 


q(x)  =  xf  +  4x1x2  —  2x±x3  +  6x2  +  9x3  >  0,  for  all  x  =  ( aq,  aq,  x3  )T  ^  0. 

Indeed,  the  LDLT  factorization  implies  that  q(x)  can  be  explicitly  written  as  a  sum  of 
squares: 

q(x)  =  x\  +  4x1  x2  —  2aq  x3  +  6^2  +  9x \  —  y\  +  2y2  +  6 2/3 ,  (3.75) 

where 

Vl=Xi+  2x2-x3:  y2=x2+X  3,  y3=X3i 

are  the  entries  of  y  =  LTx.  Positivity  of  the  coefficients  of  the  yf  (which  are  the  pivots) 
implies  that  q(x)  is  positive  definite. 

(1  2  3\ 

Example  3.45.  Let’s  test  whether  the  matrix  K  —  2  3  7  is  positive  definite. 

\3  7  8/ 

When  we  perform  Gaussian  Elimination,  the  second  pivot  turns  out  to  be  —1,  which 
immediately  implies  that  K  is  not  positive  definite  —  even  though  all  its  entries  are  positive. 
(The  third  pivot  is  3,  but  this  does  not  affect  the  conclusion;  all  it  takes  is  one  non-positive 
pivot  to  disqualify  a  matrix  from  being  positive  definite.  Also,  row  interchanges  aren’t  of 
any  help,  since  we  are  not  allowed  to  perform  them  when  checking  for  positive  definiteness.) 
This  means  that  the  associated  quadratic  form 

q(x)  =  x\  +  4aq  x2  +  6x±  x3  +  3^2  +  14x2x3  +  8x3 

assumes  negative  values  at  some  points.  For  instance,  q(— 2, 1,0)  =  —1. 

A  direct  method  for  completing  the  square  in  a  quadratic  form  goes  as  follows:  The  first 
step  is  to  put  all  the  terms  involving  r1  in  a  suitable  square,  at  the  expense  of  introducing 
extra  terms  involving  only  the  other  variables.  For  instance,  in  the  case  of  the  quadratic 
form  in  (3.75),  the  terms  involving  x1  can  be  written  as 


O  /  v  o  o 

x1  +  4aq  x2  —  2xx  x3  =  (aq  +  2x2  —  x3)  —  4x2  +  4x2  x3 


X‘ 


Therefore, 

q(x)  —  (x1  -\-2x2  —  x3)2  +  2^2  +  4x2  x3  +  8x3  =  (x1  +  2x2  —  x3)2  +  q{x2,  x3), 

where 

q(x2,  x3)  =  2x2  +  4x2  x3  +  8x3 


3.5  Completing  the  Square 
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is  a  quadratic  form  that  involves  only  x2,x3.  We  then  repeat  the  process,  combining  all 
the  terms  involving  x2  in  the  remaining  quadratic  form  into  a  square,  writing 

q(x  2,  X3)  =  2  (x2  +  X3)2  +  6xj. 


This  gives  the  final  form 

q(x)  =  (x1  +  2x2  —  x3)2  +  2{x2  +  x3)2  +  6x3, 


which  reproduces  (3.75). 

In  general,  as  long  as  fcn  7^  0,  we  can  write 


q(x)  =xTKx  =  knx21+2k12x1x2  +  ■■■  +2klnx1xn  + k22x22  +  •••  +  knnx2n 


—  kn  [  x,  + 


k 


12 


K 


X0  T 


k-i  r. 

+  1 ~Xn 
^11 


)  +q(x  2,...,xn) 


(3.76) 


—  fcn  (x1  +  l21  x2  +  •  •  •  +  lnl  xn)2  +  q(x2, . . . ,  xn), 


where 


_k2]__k 
21  kn  k 


12 


^nl 


11 


^  h 

hjnl  _  gUn 

^11  ^11 


are  precisely  the  multiples  appearing  in  the  matrix  L  obtained  from  applying  Gaussian 
Elimination  to  iC,  while 


n 


rr* 


q(x2,...,xn)  =  kij 

i,3  =  2 

is  a  quadratic  form  involving  one  less  variable.  The  entries  of  its  symmetric  coefficient 
matrix  K  are 

k1j  ku 


kij  kji  k-  lj1  ku  k -  ^ 


i,  j  =  2,...n, 


11 


which  are  exactly  the  same  as  the  entries  appearing  below  and  to  the  right  of  the  first 
pivot  after  applying  the  the  first  phase  of  the  Gaussian  Elimination  process  to  K.  In 
particular,  the  second  pivot  of  K  is  the  diagonal  entry  k22.  We  continue  by  applying  the 
same  procedure  to  the  reduced  quadratic  form  q(x2, . . . ,  xn)  and  repeating  until  only  the 
final  variable  remains.  Completing  the  square  at  each  stage  reproduces  the  corresponding 
phase  of  the  Gaussian  Elimination  process.  The  final  result  is  our  formula  (3.74)  rewriting 
the  original  quadratic  form  as  a  sum  of  squares  whose  coefficients  are  the  pivots. 

With  this  in  hand,  we  can  now  complete  the  proof  of  Theorem  3.43.  First,  if  the  upper 
left  entry  fcn,  namely  the  first  pivot,  is  not  strictly  positive,  then  K  cannot  be  positive 
definite,  because  q(e±)  =  e{Ke1  =  fcn  <  0.  Otherwise,  suppose  fcn  >  0,  and  so  we  can 
write  q(x.)  in  the  form  (3.76).  We  claim  that  g(x)  is  positive  definite  if  and  only  if  the 
reduced  quadratic  form  q(x2,  •  •  •  ,xn)  is  positive  definite.  Indeed,  if  q  is  positive  definite 
and  k1±  >  0,  then  q(x)  is  the  sum  of  two  positive  quantities,  which  simultaneously  vanish 
if  and  only  if  x1  =  x2  —  •  •  •  =  xn  —  0.  On  the  other  hand,  suppose  q(x2, . . . ,  x* )  <  0  for 
some  x2l . . . ,  not  all  zero.  Setting  x\  =  —  l21  x2  —  •  •  •  —  lnl  x *  makes  the  initial  square 
term  in  (3.76)  equal  to  0,  so 


q(xi,x 2,  q(x 2, . . . ,  O  <  0, 

proving  the  claim.  In  particular,  positive  definiteness  of  q  requires  that  the  second  pivot 
satisfy  k22  >  0.  We  then  continue  the  reduction  procedure  outlined  in  the  preceding 
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paragraph;  if  a  non-positive  entry  appears  in  the  diagonal  pivot  position  at  any  stage,  the 
original  quadratic  form  and  matrix  cannot  be  positive  definite.  On  the  other  hand,  finding 
all  positive  pivots  (without  using  any  row  interchanges)  will,  in  the  absence  of  numerical 
errors,  ensure  positive  definiteness.  Q.E.D. 


Exercises 


3.5.1.  Are  the  following  matrices  are  positive  definite?  (a) 


4 

-2 


2 

4 


(b) 


1 

1 


1 

1 


(c) 


(1 

1 

2\ 

/ 

1 

2 

1 

.  (d) 

2 

1 

i/  ' 

1 

2 

2 


1\ 

2 

4  ) 


(e) 


(2 

1 

1 

n 

/-I 

1 

1 

1\ 

1 

2 

1 

i 

.  (f) 

1 

-1 

1 

1 

1 

1 

2 

i 

1 

1 

-1 

1 

U 

1 

1 

2  ) 

^  1 

1 

1 

-1/ 

3.5.2.  Find  an  LDLT  factorization  of  the  following  symmetric  matrices.  Which  are  positive 


definite?  (a) 


(e) 


/ 


V 


2 

1 

2 


1 

1 

3 


1 
2 

—2  \ 
-3 
11/ 


2 

3 


( b ) 


5  -1 
1  3 


(c) 


/ 


V 


3 

1 

3 


-1 

5 

1 


( d ) 


/  —2 
1 

v-i 


1 

2 

1 


(0 


/I 

l 

1 

o\ 

(3 

2 

1 

°\ 

/  2 

1 

-2 

°\ 

1 

2 

0 

1 

.  (g) 

2 

3 

0 

1 

.  (^) 

1 

1 

-3 

2 

1 

0 

1 

1 

1 

0 

3 

2 

-2 

-3 

10  -1 

\o 

1 

1 

2  / 

1 

2 

3  / 

l  o 

2 

-1 

7/ 

3.5.3.  (a)  For  which  values  of  c  is  the  matrix  A  = 


(l 

1 

Vo 


1 

c 

1 


0\ 

1 

1/ 


positive  definite?  (b)  For  the 


particular  value  c  =  3,  carry  out  elimination  to  find  the  factorization  A  =  LDLT .  (c)  Use 

your  result  from  part  (b)  to  rewrite  the  quadratic  form  q(x,y,z)  =  x  -\~2xy-\~3y  -\~2y z-\~ z 
as  a  sum  of  squares,  (d)  Explain  how  your  result  is  related  to  the  positive  definiteness  of  A. 

O  O  O  rT1 

3.5.4.  Write  the  quadratic  form  qfx.)  =  xf  +  x1  x2  +  2x2  —  x1  x3  +  3x3  in  the  form  g(x)  =x  Ex 
for  some  symmetric  matrix  iF.  Is  g(x)  positive  definite? 

3.5.5.  Write  the  following  quadratic  forms  on  l2  as  a  sum  of  squares.  Which  are  positive 

definite?  (a)  x2  +  8xy  +  y2 ,  (b)  x2  —  Axy  +  7 y2 ,  (c)  x2  —  2 xy  —  y2,  (d)  x2  +  6x?/. 

3.5.6.  Prove  that  the  following  quadratic  forms  on  R3  are  positive  definite  by  writing  each  as  a 

sum  of  squares:  (a)  x2  +  \xz  +  3y2  +  5z2,  (b)  x2 -j- 3xy -h  3y2  —  2x z -h  8z2 , 

(c)  2x2  +  x1  x2  —  2x1  x3  +  2^2  —  2x2  x3  +  2x2. 

3.5.7.  Write  the  following  quadratic  forms  in  matrix  notation  and  determine  if  they  are  positive 

definite:  (a)  x2  +  4x2:  +  2y2  +  8yz  +  12z2,  (b)  3x2  —  2y2  —  8xy  +  xz  +  z2 , 

(c)  x2  +  2xy  +  2y2  —  Axz  —  6y z  +  6z2,  (d)  3x2  —  x\  +  5x2  +  4x1x2  —  7x1x3  +  9x2 x3, 

(e)  x\  +  Ax^  x2  —  2x1  x3  +  5x2  —  2x2  x4  +  6x3  —  x3  x4  +  4x4. 

o  o  o 

3.5.8.  For  what  values  of  a,  6,  and  c  is  the  quadratic  form  x  +  axy  +  y  +  bxz  +  cy  z  -\-  z 
positive  definite? 

_  r\  r\ 

3.5.9.  True  or  false:  Every  planar  quadratic  form  q(x,y)  =  ax  -\-2bxy  +  cy  can  be  written  as 
a  sum  of  squares. 

3.5.10.  (a)  Prove  that  a  positive  definite  matrix  has  positive  determinant:  detiF  >  0. 

(b)  Show  that  a  positive  definite  matrix  has  positive  trace:  tr  K  >0.  (c)  Show  that  every 
2x2  symmetric  matrix  with  positive  determinant  and  positive  trace  is  positive  definite. 

(d)  Find  a  symmetric  3x3  matrix  with  positive  determinant  and  positive  trace  that  is  not 
positive  definite. 


3.5  Completing  the  Square 
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3.5.11.  (a)  Prove  that  if  K2  are  positive  definite  n  x  n  matrices,  then  K  = 
is  a  positive  definite  2n  x  2n  matrix,  (b)  Is  the  converse  true? 

3.5.12.  Let  ||  •  ||  be  any  norm  on  Mn.  (a)  Show  that  qfx.)  is  a  positive  definite  quadratic  form 

if  and  only  if  q( u)  >  0  for  all  unit  vectors,  ||u||  =  1.  (b)  Prove  that  if  S  =  ST  is  any 

symmetric  matrix,  then  K  =  S  c  I  >0is  positive  definite  if  c  is  sufficiently  large. 


K1  O 
O  Kn 


3.5.13.  Prove  that  every  symmetric  matrix  S  =  K  +  N  can  be  written  as  the  sum  of  a  positive 
definite  matrix  K  and  a  negative  definite  matrix  N.  Hint :  Use  Exercise  3.5.12(b). 


0  3.5.14.  (a)  Prove  that  every  regular  symmetric  matrix  can  be  decomposed  as  a  linear  combination 

K  =  G^^lf  +  d2l2\2  +  •••  +dnlnl^  (3.77) 

of  symmetric  rank  1  matrices,  as  in  Exercise  1.8.15,  where  ll5 . . . ,  ln  are  the  columns  of  the 
lower  unitriangular  matrix  L  and  d1, . . . ,  dn  are  the  pivots,  i.e.,  the  diagonal  entries  of  D. 


Hint :  See  Exercise  1.2.34.  (b)  Decompose  ^ 


4  -1 

-1  1 


\ 

(1 

2 

n 

I  and 

2 

6 

i 

/ 

u 

1 

4/ 

in  this  manner. 


T  3.5.15.  There  is  an  alternative  criterion  for  positive  definiteness  based  on  subdeterminants  of 
the  matrix.  The  2x2  version  already  appears  in  (3.70).  (a)  Prove  that  a  3  x  3  matrix 

is  positive  definite  if  and  only  if  a  >  0,  ad  —  b2  >  0,  and  det  K  >  0. 

(b)  Prove  the  general  version:  an  n  x  n  matrix  K  >  0  is  positive  definite  if  and  only  if  its 
upper  left  square  k  x  k  submatrices  have  positive  determinant  for  all  k  =  1, . . . ,  n. 

Hint :  See  Exercise  1.9.17. 


(  a 

b 

c ^ 

K  = 

b 

d 

e 

Kc 

e 

fj 

0  3.5.16.  Let  K  be  a  symmetric  matrix.  Prove  that  if  a  non-positive  diagonal  entry  appears 
anywhere  (not  necessarily  in  the  pivot  position)  in  the  matrix  during  Regular  Gaussian 
Elimination,  then  K  is  not  positive  definite. 

0  3.5.17.  Formulate  a  determinantal  criterion  similar  to  that  in  Exercise  3.5.15  for  negative 
definite  matrices.  Write  out  the  2x2  and  3x3  cases  explicitly. 

3.5.18.  True  or  false:  A  negative  definite  matrix  must  have  negative  trace  and  negative 
determinant. 


The  Cholesky  Factorization 


The  identity  (3.73)  shows  us  how  to  write  an  arbitrary  regular  quadratic  form  q(x)  as 
a  linear  combination  of  squares.  We  can  push  this  result  slightly  further  in  the  positive 
definite  case.  Since  each  pivot  di  is  positive,  we  can  write  the  diagonal  quadratic  form 
(3.74)  as  a  sum  of  pure  squares: 

^l2/l+  +  dn  Vn  =  (  V^l  2/l  )  +  ■”  +(v^nl/n)  =  Z1  +  +4 

where  zi  —  ^ d ~  yi.  In  matrix  form,  we  are  writing 


q( y)  =  yT-Dy  =  zTz  =  ||z||2,  where  z  =  Sy,  with  S  =  diag  (  , . . . ,  )  . 

Since  D  =  S'2,  the  matrix  S  can  be  thought  of  as  a  “square  root”  of  the  diagonal  matrix 
D.  Substituting  back  into  (1.58),  we  deduce  the  Cholesky  factorization 


K  =  LDLt  =  LSSt  Lt  =  MMt,  where  M  =  LS , 


(3.78) 


of  a  positive  definite  matrix,  first  proposed  by  the  early  twentieth-century  French  geogra¬ 
pher  Andre-Louis  Cholesky  for  solving  problems  in  geodetic  surveying.  Note  that  M  is  a 
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lower  triangular  matrix  with  all  positive  diagonal  entries,  namely  the  square  roots  of  the 
pivots:  mu  =  ^J~di .  Applying  the  Cholesky  factorization  to  the  corresponding  quadratic 
form  produces 


q(x)  —  xTkx  =  ~x2  MM 1  x  =  z1  z  = 


T 


-T. 


,T. 


where 


z  =  Mtx. 


(3.79) 


We  can  interpret  (3.79)  as  a  change  of  variables  from  x  to  z  that  converts  an  arbitrary 
inner  product  norm,  as  defined  by  the  square  root  of  the  positive  definite  quadratic  form 
g(x),  into  the  standard  Euclidean  norm 


Example  3.46.  For  the  matrix  K  — 


1  2  -1 

2  6  0  |  considered  in  Example  3.44,  the 

10  9 


Cholesky  formula  (3.78)  gives  K  =  MMT,  where 


M  =  LS  = 


1  0  0 

0  y/2  0 

0  0  V6 


1  0  0 

2  y/2  0 

-1  y/2  V6 


The  associated  quadratic  function  can  then  be  written  as  a  sum  of  pure  squares: 

q(x)  =  x\  +  4aq  x2  —  2x1  x3  +  6x2  +  9x3  =  z\  +  z\  +  z3, 

where 


nrt 

z  —  M  x,  or,  explicitly,  z1  =  x1  +  2x2 


x 


3’ 


—  y/2  x0  +  y/2 


x 


3’ 


=  y/6 


x< 


Exercises 

3.5.19.  Find  the  Cholesky  factorizations  of  the  following  matrices: 


(*>) 


4 

-12 


12 

45 


(c) 


/ 1 
1 

VI 


1  1 

2  -2 
2  14 


(a) 


\ 

(2  1  1\ 

.  (d) 

1  2  1 

>  (e) 

J 

O  1  27 

3 

-2 


1 

0 


-2 
2 

1  0  0\ 
0 
1 

2  J 


2  1 
1  2 


\0  0  1 


3.5.20.  Which  of  the  matrices  in  Exercise  3.5.1  have  a  Cholesky  factorization?  For  those  that 
do,  write  out  the  factorization. 


3.5.21.  Write  the  following  positive  definite  quadratic  forms  as  a  sum  of  pure  squares,  as 
in  (3.79):  (a)  16x^  +  25x2,  (b)  x\  —  2x1x2  +  4x2,  (c)  5xf  +  4x1x2  +  3x2, 

(d)  3x\  —  2x1  x2  —  %x1  x3  +  2x2  +  6x3,  (e)  x\  +  x1  x2  +  x2  +  x2  x3  +  x3, 

(f)  Ax\  —  2x1x2  —  4x-l  x3  +  \  x\  —  x2  x3  +  6x3, 

(g)  3x\  +  2xl  x2  +  3x2  +  2x2x3  +  3x3  +  2x3x4  +  3x4. 


3.6  Complex  Vector  Spaces 

Although  physical  applications  ultimately  require  real  answers,  complex  numbers  and  com¬ 
plex  vector  spaces  play  an  extremely  useful,  if  not  essential,  role  in  the  intervening  analysis. 
Particularly  in  the  description  of  periodic  phenomena,  complex  numbers  and  complex  ex¬ 
ponentials  help  to  simplify  complicated  trigonometric  formulas.  Complex  variable  methods 


3.6  Complex  Vector  Spaces 
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Figure  3.7.  Complex  Numbers. 


are  ubiquitous  in  electrical  engineering,  Fourier  analysis,  potential  theory,  fluid  mechanics, 
electromagnetism,  and  many  other  applied  fields,  [49,  79].  In  quantum  mechanics,  the  ba¬ 
sic  physical  quantities  are  complex- valued  wave  functions,  [54].  Moreover,  the  Schrodinger 
equation,  which  governs  quantum  dynamics,  is  an  inherently  complex  partial  differential 
equation. 

In  this  section,  we  survey  the  principal  properties  of  complex  numbers  and  complex 
vector  spaces.  Most  of  the  constructions  are  straightforward  adaptations  of  their  real 
counterparts,  and  so  will  not  be  dwelled  on  at  length.  The  one  exception  is  the  complex 
version  of  an  inner  product,  which  does  introduce  some  novelties  not  found  in  its  simpler 
real  sibling. 


Complex  Numbers 


Recall  that  a  complex  number  is  an  expression  of  the  form  z  =  x  -\-  it/,  where  are 

real  and^  i  =  y/—l.  The  set  of  all  complex  numbers  (scalars)  is  denoted  by  C.  We  call 
x  =  He  z  the  real  part  of  z  and  y  =  Im  z  the  imaginary  part  of  z  =  x  +  i  y.  (Note:  The 
imaginary  part  is  the  real  number  y ,  not  i  y.)  A  real  number  x  is  merely  a  complex  number 
with  zero  imaginary  part,  Im  z  =  0,  and  so  we  may  regard  McC.  Complex  addition  and 
multiplication  are  based  on  simple  adaptations  of  the  rules  of  real  arithmetic  to  include 
the  identity  i2  =  —1,  and  so 


(x  +  i  y)  +  (u  +  i  v)  =  (x  +  u)  +  i  (y  +  v), 

(x  +  i y)  (u  +  i  v)  =  (xu  —  yv )  +  i  (xv  +  yu). 


(3.80) 


Complex  numbers  enjoy  all  the  usual  laws  of  real  addition  and  multiplication,  including 
commutativity:  zw  =  wz. 

We  can  identify  a  complex  number  x  +  iy  with  a  vector  (x,y)  G  M2  in  the  real 
plane.  For  this  reason,  C  is  sometimes  referred  to  as  the  complex  plane.  Complex  addition 
(3.80)  corresponds  to  vector  addition,  but  complex  multiplication  does  not  have  a  readily 
identifiable  vector  counterpart. 

Another  useful  operation  on  complex  numbers  is  that  of  complex  conjugation. 


Definition  3.47.  The  complex  conjugate  oi  z  =  x+  i  y  is  z  =  x—  i  y,  whereby  Re  z  =  Re  z, 
while  Im  ~z  =  —  Im  z. 


^  To  avoid  confusion  with  the  symbol  for  current,  electrical  engineers  prefer  to  use  j  to  indicate 
the  imaginary  unit. 
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Geometrically,  the  complex  conjugate  of  z  is  obtained  by  reflecting  the  corresponding 
vector  through  the  real  axis,  as  illustrated  in  Figure  3.7.  In  particular  z  =  z  if  and  only  if 
z  is  real.  Note  that 


R  e  z  — 


z  +  z 


Im  z  = 


z  —  z 


2  2  i 

Complex  conjugation  is  compatible  with  complex  arithmetic: 


(3.81) 


2  +  w  =  z  +  w. 


zw  =  z  w. 


In  particular,  the  product  of  a  complex  number  and  its  conjugate, 

zz  =  (x  +  i  y)(x—  iy)  =  x2  +y2, 


(3.82) 


is  real  and  non-negative.  Its  square  root  is  known  as  the  modulus  or  norm  of  the  complex 
number  2  =  x  +  i  y,  and  written 


z 


=  \/x2  +  y2  . 


(3.83) 


Note  that  \z  \  >  0,  with  \z  \  —  0  if  and  only  if  z  =  0.  The  modulus  |  z  |  generalizes  the 
absolute  value  of  a  real  number,  and  coincides  with  the  standard  Euclidean  norm  in  the 
xy- plane,  which  implies  the  validity  of  the  triangle  inequality 


£  +  w  < 


z 


+ 


w 


(3.84) 


Equation  (3.82)  can  be  rewritten  in  terms  of  the  modulus  as 

2 


z  z  — 


z 


(3.85) 


Rearranging  the  factors,  we  deduce  the  formula  for  the  reciprocal  of  a  nonzero  complex 
number: 


1  _  z 
z  z 


2  ’ 


z  7^  0,  or,  equivalently, 


1 


x  —  1  y 


x  +  i  y  x2  +  y 


2  ' 


(3.86) 


The  general  formula  for  complex  division, 


w 

z 


w  z 


z 


or 


u-\-iv  {xu  +  y  v)  +  i  {pc  v  —  yu) 


x  +  iy 


x2  +  y[ 


(3.87) 


is  an  immediate  consequence. 

The  modulus  of  a  complex  number, 


r  = 


z 


yj x2  +  y2  , 


is  one  component  of  its  polar  coordinate  representation 


x  —  r  cos  0, 


y  —  r  sin  9 


or 


z  =  r(cos  6  +  i  sin  9) 


(3.88) 


The  polar  angle,  which  measures  the  angle  that  the  line  connecting  z  to  the  origin  makes 
with  the  horizontal  axis,  is  known  as  the  phase ,  and  written 


9  =  ph  z. 


(3.89) 


As  such,  the  phase  is  defined  only  up  to  an  integer  multiple  of  27 r.  The  more  common  term 
for  the  angle  is  the  argument ,  written  arg  z  =  phz.  However,  we  prefer  to  use  “phase” 
throughout  this  text,  in  part  to  avoid  confusion  with  the  argument  z  of  a  function  f(z). 
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We  note  that  the  modulus  and  phase  of  a  product  of  complex  numbers  can  be  readily 
computed: 


zw 


z 


w 


ph  (zw)  =  ph z  -h  ph  w. 


Complex  conjugation  preserves  the  modulus,  but  reverses  the  sign  of  the  phase: 


z 


z 


ph  z  —  —  phz. 


(3.90) 


(3.91) 


One  of  the  most  profound  formulas  in  all  of  mathematics  is  Euler’s  formula 

el6  —  cos 9  +  i  sin 0, 


(3.92) 


relating  the  complex  exponential  with  the  real  sine  and  cosine  functions.  It  has  a  variety  of 
mathematical  justifications;  see  Exercise  3.6.23  for  one  that  is  based  on  comparing  power 
series.  Euler’s  formula  can  be  used  to  compactly  rewrite  the  polar  form  (3.88)  of  a  complex 
number  as 


z  —  r  e 


i  6 


where 


r  = 


z 


9  —  ph  z. 


(3.93) 


The  complex  conjugation  identity 


e  l6>  =  cos (—9)  +  i  sin (—9)  =  cos 9  —  i  sin 9  =  el° 
permits  us  to  express  the  basic  trigonometric  functions  in  terms  of  complex  exponentials: 


cos  9  — 


eld  +  e 


sin  9  = 


e i e  —  e 

2l 


(3.94) 


These  formulas  are  very  useful  when  working  with  trigonometric  identities  and  integrals. 

The  exponential  of  a  general  complex  number  is  easily  derived  from  the  Euler  formula 
and  the  standard  properties  of  the  exponential  function  —  which  carry  over  unaltered  to 
the  complex  domain;  thus, 


ez  =  ex+[y  =  ex  e[y  =  ex  cos  y+  ie*sin  y. 


(3.95) 


Note  that  e27ri  =  1,  and  hence  the  exponential  function  is  periodic, 


Z-\- 2  7T  i  z 

e ^  —  e  . 


(3.96) 


with  imaginary  period  2  tt  i  —  indicative  of  the  periodicity  of  the  trigonometric  functions 
in  Euler’s  formula. 


Exercises 

3.6.1.  Write  down  a  single  equation  that  relates  the  five  most  important  numbers  in 
mathematics,  which  are  0, 1,  e,  7 r,  and  i . 

3.6.2.  For  any  integer  k ,  prove  that  ek7Tl  =  (— l)k . 

3.6.3.  Is  the  formula  lz  =  1  valid  for  all  complex  values  of  z? 

3.6.4.  What  is  wrong  with  the  calculation  e2a7F1  =  (e2jri  )a  =  la  =  1? 

3.6.5.  (a)  Write  i  in  phase-modulus  form,  (b)  Use  this  expression  to  find  \/T,  i.e.,  a  complex 

number  z  such  that  z2  =  i .  Can  you  find  a  second  square  root?  (c)  Find  explicit 
formulas  for  the  three  third  roots  and  four  fourth  roots  of  i . 

3.6.6.  In  Figure  3.7,  where  would  you  place  the  point  1/zl 
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3.6.7.  (a)  If  z  moves  counterclockwise  around  a  circle  of  radius  r  in  the  complex  plane,  around 
which  circle  and  in  which  direction  does  w  =  1/z  move?  (b)  What  about  w  =  z? 

(c)  What  if  the  circle  is  not  centered  at  the  origin? 


0  3.6.8.  Show  that 


z 


<  Re  z  < 


z 


and 


z 


<  Im  z  < 


z 


0  3.6.9.  Prove  that  if  is  real,  then  Re  (e1(pz)<\z\  | ,  with  equality  if  and  only  if  =  —  ph  z 

3.6.10.  Prove  the  identities  in  (3.90)  and  (3.91). 

3.6.11.  Prove  ph (z/w)  =  phz  —  phic  =  ph (zw)  is  equal  to  the  angle  between  the  vectors 
representing  z  and  w. 

3.6.12.  The  phase  of  a  complex  number  z  =  x  +  \y  is  often  written  as  phz  =  tan_1(y/x). 
Explain  why  this  formula  is  ambiguous,  and  does  not  uniquely  define  phz. 


3.6.13.  Show  that  if  we  identify  the  complex  numbers  z,ie  with  vectors  in  the  plane,  then  their 
Euclidean  dot  product  is  equal  to  Re  (zw). 

3.6.14.  (a)  Prove  that  the  complex  numbers  z  and  w  correspond  to  orthogonal  vectors  in  R  if 
and  only  if  Re  zw  =  0.  (b)  Prove  that  z  and  iz  are  always  orthogonal. 

3.6.15.  Prove  that  ez+w  =  ez  ew .  Conclude  that  em^  =  (e2)171  whenever  m  is  an  integer. 


3.6.16.  (a)  Use  the  formula  e2l°  =  (el6>)2  to  deduce  the  well-known  trigonometric  identities 
for  cos  2#  and  sin 20.  (b)  Derive  the  corresponding  identities  for  cos 3#  and  sin 30. 

(c)  Write  down  the  explicit  identities  for  cos mO  and  sinra#  as  polynomials  in  cos#  and 
sin#.  Hint :  Apply  the  Binomial  Formula  to  (e1®)771 . 


0  3.6.17.  Use  complex  exponentials  to  prove  the  identity  cos# 


cos  (f  =  2  cos 


0  —  (p 

2 


#  +  (f 
cos - 

2 


3.6.18.  Prove  that  if  z  =  x  +  i  y,  then 


giz  _|_  e-  iz  e  i  z  _  g—  i  z 

3.6.19.  The  formulas  cosz  =  -  and  sinz  =  -  serve  to  define  the  basic 

2  2  i 

complex  trigonometric  functions.  Write  out  the  formulas  for  their  real  and  imaginary  parts 
in  terms  of  z  =  x  +  iy,  and  show  that  cosz  and  sinz  reduce  to  their  usual  real  forms  when 
z  =  x  is  real.  What  do  they  become  when  z  =  i  y  is  purely  imaginary? 


Z  I  —  z 

e  T  e 

3.6.20.  The  complex  hyperbolic  functions  are  defined  as  coshz  =  - - - ,  sinhz  = 


z  —  z 
e  —  e 


2  2 

(a)  Write  out  the  formulas  for  their  real  and  imaginary  parts  in  terms  of  z  =  x  +  iy. 

(b)  Prove  that  cos  iz  =  coshz  and  sin  iz  =  i  sinhz. 


U  3.6.21.  Generalizing  Example  2.17c,  by  a  trigonometric  polynomial  of  degree  <  n,  we  mean 

a  function  T(x)  =  I?o<j+fc<n  cjk  (cos^)2  (sin#)fc  in  the  powers  of  the  sine  and  cosine 

functions  up  to  degree  n.  (a)  Use  formula  (3.94)  to  prove  that  every  trigonometric 
polynomial  of  degree  <  n  can  be  written  as  a  complex  linear  combination  of  the  2n  +  1 

complex  exponentials  e  ,  ...  e  ,  e  =l,e  ,  e  ,  ...  e  -(b)  Prove  that 
every  trigonometric  polynomial  of  degree  <  n  can  be  written  as  a  real  linear  combination  of 
the  trigonometric  functions  1,  cos#,  sin#,  cos2#,  sin2#,  ...  cos n#,  sin nO. 

(c)  Write  out  the  following  trigonometric  polynomials  in  both  of  the  preceding  forms: 

( i )  cos2#,  (ii)  cos# sin#,  (in)  cos3#,  (iv)  sin4#,  (a)  cos2# sin2#. 

0  3.6.22.  Write  out  the  real  and  imaginary  parts  of  the  power  function  xc  with  complex  exponent 
c  =  a  +  i  b  G  C. 


0  3.6.23.  Write  the  power  series  expansions  for  e1  x .  Prove  that  the  real  terms  give  the  power 
series  for  cosx,  while  the  imaginary  terms  give  that  of  sinx.  Use  this  identification  to 
justify  Euler’s  formula  (3.92). 
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0  3.6.24.  The  derivative  of  a  complex-valued  function  /(x)  =  u(x)  +  ix(x),  depending  on  a 
real  variable  x,  is  given  by  f'(x)  =  u  (x)  +  iv'(x).  (a)  Prove  that  if  A  =  /i  +  in  is  any 


provided  A  /  0. 


complex  scalar,  then  e^x  =  \e^x .  (b) 

ax 


Prove,  conversely,  J  eXx  dx  = 


A  a 


3.6.25.  Use  the  complex  trigonometric  formulas  (3.94)  and  Exercise  3.6.24  to  evaluate  the 
following  trigonometric  integrals:  (a)  J  cos 2  xdx,  (b)  J  sin 2  xdx,  (c)  J  cos  x  sin  xdx. 

(d)  /  cos 3 x sin 5 xdx.  How  did  you  calculate  them  in  first-year  calculus?  If  you’re  not 
convinced  this  method  is  easier,  try  the  more  complicated  integrals 

(e)  J  cos 4  xdx,  (f)  J  sin 4  xdx,  (g)  J  cos2  x  sin2  x  dx,  (h)  J  cos 3 x sin 5 x  cos  7 xdx. 


Complex  Vector  Spaces  and  Inner  Products 


A  complex  vector  space  is  defined  in  exactly  the  same  manner  as  its  real  counterpart,  as 
in  Definition  2.1,  the  only  difference  being  that  we  replace  real  scalars  by  complex  scalars. 
The  most  basic  example  is  the  n-dimensional  complex  vector  space  Cn  consisting  of  all 
column  vectors  z  =  (  2q,  z2,  •  •  • ,  zn  )  that  have  n  complex  entries  2q,  . . . ,  zn  E  C.  Vector 
addition  and  scalar  multiplication  are  defined  in  the  obvious  manner,  and  verification  of 
each  of  the  vector  space  axioms  is  immediate. 

We  can  write  any  complex  vector  z  =  x+  iy  G  Cn  as  a  linear  combination  of  two  real 
vectors  x  =  Re  z  and  y  =  Im  z  E  Mn  called  its  real  and  imaginary  parts.  Its  complex 
conjugate  z  =  x  —  iy  is  obtained  by  taking  the  complex  conjugates  of  its  individual  entries. 
Thus,  for  example,  if 


z  = 


then 


Im  z  = 


5 


/  1  —  2  i 

and  so  its  complex  conjugate  is  z  =  —3 

\  —  5  i 


In  particular,  z  E  Mn  C  Cn  is  a  real  vector  if  and  only  if  z  =  z. 

Most  of  the  vector  space  concepts  we  developed  in  the  real  domain,  including  span,  linear 
independence,  basis,  and  dimension,  can  be  straightforwardly  extended  to  the  complex 
regime.  The  one  exception  is  the  concept  of  an  inner  product,  which  requires  a  little 
thought.  In  analysis,  the  primary  applications  of  inner  products  and  norms  rely  on  the 
associated  inequalities:  Cauchy-Schwarz  and  triangle.  But  there  is  no  natural  ordering  of 
the  complex  numbers,  and  so  one  cannot  assign  a  meaning  to  a  complex  inequality  like 
z  <  w.  Inequalities  make  sense  only  in  the  real  domain,  and  so  the  norm  of  a  complex 
vector  should  still  be  a  positive  and  real.  With  this  in  mind,  the  naive  idea  of  simply 
summing  the  squares  of  the  entries  of  a  complex  vector  will  not  define  a  norm  on  Cn, 
since  the  result  will  typically  be  complex.  Moreover,  some  nonzero  complex  vectors,  e.g., 
( 1,  i  )  ,  would  then  have  zero  “norm”. 

The  correct  definition  is  modeled  on  the  formula 


z 


y. 


z  z 


which  defines  the  modulus  of  a  complex  scalar  zeC.  If,  in  analogy  with  the  real  definition 
(3.7),  the  quantity  inside  the  square  root  should  represent  the  inner  product  of  z  with 
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itself,  then  we  should  define  the  “dot  product”  between  two  complex  numbers  to  be 

o 

so  that  *  *  *  “ 


2  •  w  —  zw , 


Z  '  z  —  z  z  — 


z 


Writing  out  the  formula  when  z  =  x  +  i y  and  w  =  u  +  iv:  we  obtain 

z  •  w  =  zw  —  {x  +  i y)  (u  —  iv)  =  {xu  +  yv)  +  i  (yu  —  xv ) 


(3.97) 


Thus,  the  dot  product  of  two  complex  numbers  is,  in  general,  complex.  The  real  part  of 
z  -  w  is,  in  fact,  the  Euclidean  dot  product  between  the  corresponding  vectors  in  M2,  while 
its  imaginary  part  is,  interestingly,  their  scalar  cross  product,  cf.  (3.22). 

The  vector  version  of  this  construction  is  named  after  the  nineteenth-century  French 
mathematician  Charles  Hermite,  and  called  the  Hermitian  dot  product  on  Cn.  It  has  the 
explicit  formula 

(z  A  (wi\ 


m 

z  w  =  z  w  =  z1  w1  +  z2  w2  +  •  •  •  +  zn  wn ,  for  z  = 


Zr 


w  = 


\zn) 


W, 


(3.98) 


\Wn) 


Pay  attention  to  the  fact  that  we  must  apply  complex  conjugation  to  all  the  entries  of  the 
second  vector.  For  example,  if 


z  — 


1+  i 
3  +  2  i 


w  =  i  1  ^ 1  ] ,  then  z  •  w  —  (1+  i)(l  —  2i)  +  (3  +  2i)(—  i)  =  5  —  4 i 


On  the  other  hand, 


w-z  =  (l  +  2i)(l  —  i )  +  i  (3  —  2i)  =  5  +  4i, 


and  we  conclude  that  the  Hermitian  dot  product  is  not  symmetric.  Indeed,  reversing  the 
order  of  the  vectors  conjugates  their  dot  product: 


w  •  z  =  z  •  w. 


This  is  an  unexpected  complication,  but  it  does  have  the  desired  effect  that  the  induced 
norm,  namely 


0  < 


=  y/ Z  •  Z  =  V ZT  Z  =  \f 


z. 


+ 


+ 


2) 


n 


(3.99) 


is  strictly  positive  for  all  0  ^  z  E  Cn.  For  example,  if 


then 


=  \J  |l  +  3i  |2  +  |— 2i  |2  +  |— 5|2  =  a/39 


The  Hermitian  dot  product  is  well  behaved  under  complex  vector  addition: 

(z  +  z)  •  w  =  z  •  w  +  z  •  w,  z  •  (w  +  w)  =  z  •  w  +  z  •  w. 

However,  while  complex  scalar  multiples  can  be  extracted  from  the  first  vector  without 
alteration,  when  they  multiply  the  second  vector,  they  emerge  as  complex  conjugates: 

(cz)  •  w  =  c  (z  •  w),  z  •  (c  w)  =  c  (z  •  w),  cGC. 

Thus,  the  Hermitian  dot  product  is  not  bilinear  in  the  strict  sense,  but  satisfies  something 
that,  for  lack  of  a  better  name,  is  known  as  sesquilinearity . 
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The  general  definition  of  an  inner  product  on  a  complex  vector  space  is  modeled  on  the 
preceding  properties  of  the  Hermitian  dot  product. 


Definition  3.48.  An  inner  product  on  the  complex  vector  space  V  is  a  pairing  that  takes 
two  vectors  v,  w  G  V  and  produces  a  complex  number  ( v  ,  w )  G  C,  subject  to  the  following 
requirements,  for  u,  v,  w  G  V,  and  c,  d  G  C: 

(i)  Sesquilinearity : 


(  cu  +  d v  ,  w )  =  c(u,w)  +  d  ( v  ,  w ), 
(  u  ,  c  v  +  d  w  )  =  c  (  u  ,  v )  +  d  (  u  ,  w ). 


(3.100) 


(ii)  Conjugate  Symmetry : 

( V,w)  =  (w,  v). 


(3.101) 


(in)  Positivity : 


—  (v,v)>0,  and  (v,v)=0  if  and  only  if  v  =  0.  (3.102) 


Thus,  when  dealing  with  a  complex  inner  product  space,  one  must  pay  careful  attention 
to  the  complex  conjugate  that  appears  when  the  second  argument  in  the  inner  product 
is  multiplied  by  a  complex  scalar,  as  well  as  the  complex  conjugate  that  appears  when 
the  order  of  the  two  arguments  is  reversed.  But,  once  this  initial  complication  has  been 
properly  taken  into  account,  the  further  properties  of  the  inner  product  carry  over  directly 
from  the  real  domain.  Exercise  3.6.45  contains  the  formula  for  a  general  inner  product  on 
the  complex  vector  space  Cn. 


Theorem  3.49.  The  Cauchy-Schwarz  inequality, 


v ,  w 


< 


w 


(3.103) 


with  |  •  |  now  denoting  the  complex  modulus,  and  the  triangle  inequality 


v  +  w  < 


+ 


w 


(3.104) 


are  both  valid  on  an  arbitrary  complex  inner  product  space. 


The  proof  of  (3.103-104)  is  modeled  on  the  real  case,  and  the  details  are  left  to  the 
reader. 


Example  3.50.  The  vectors  v  =  (1+  i ,  2  i ,  —  3  )T,  w  =  (2  —  i ,  1,  2  +  2  i  )J,  satisfy 


T 


=  V2  +  4  +  9  =  Gl5, 


w 


=  V5  +  1  +  8  =  Vu, 


v  •  w  =  (1+  i)(2  +  i)  +  2i  +  ( —  3) (2  —  2 i )  =  — 5  +  lli. 
Thus,  the  Cauchy-Schwarz  inequality  reads 

V,w)  =  -5+lli  =  <  V210  =  Vl5  Vu  = 


w 


Similarly,  the  triangle  inequality  tells  us  that 

V  +  w||  =  ||  (3, 1  +  2i,  -1  +  2i  )T  ||  =  V9  +  5  +  5  =  \/l9  <  \/l5  +  Vu  = 


+ 


w 


Example  3.51.  Let  C°  [  —  tt,  tt ]  denote  the  complex  vector  space  consisting  of  all 
complex- valued  continuous  functions  f(x)  =  u(x)-\-  iv(x)  depending  upon  the  real  variable 
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—7 r  <  x  <  7 r.  The  Hermitian  L2  inner  product  on  C° 


7T,  7T 


is  defined  as 


/7T 

f{x)g{x)dx  , 

-7T 


(3.105) 


i.e.,  the  integral  of  /  times  the  complex  conjugate  of  g ,  with  corresponding  norm 


/  r 

/  r 

11/11  =  \ 

/  /  1  f(x )  I2  dx  = 

J  /  u(x)2  +  v(x)2  dx  . 

(3.106) 

The  reader  can  verify  that  (3.105)  satisfies  the  Hermitian  inner  product  axioms. 

In  particular,  if  k,l  are  integers,  then  the  inner  product  of  the  complex  exponential 
functions  e 1  kx  and  e 1  lx  is  f  ,  _  1 

Z  '  1  (  ,  rC  —  l  , 


*7T 


i  kx  i  lx 

C  «i  o 


i  kx  —  i  /x 

G  G 


dx  = 


—  7T 


/7T 

ei(fe-')xdx  =  < 

-7T 


i  ( k—l)x 

G 

i(fc-0 


7T 


=  0,  k  ^  1 


X  =  — 7T 


We  conclude  that  when  k  ^  Z,  the  complex  exponentials  elfcx  and  el/x  are  orthogonal, 
since  their  inner  product  is  zero.  The  complex  formulation  of  Fourier  analysis,  [61,  77],  is 
founded  on  this  key  example. 


Exercises 


3.6.26.  Determine  whether  the  indicated  sets  of  complex  vectors  are  linearly  independent  or 


dependent.  (a) 


O) 


(f) 


■2+  i 
i 


1 

3  i 


i 
1 

4  —  3  i 

1 


1 

i 


( b ) 


2  i 

1  -  5  i 


!+  n  [  2 
1  y  ’  V 1  -  1 

/ 1  +  2i  \ 

(e) 


V 


2 

0 


/ 


(c) 

2 
0 

v 1  — 1  y 


1  +  3i 

2  -  i 


1  - 


3  i 
i 


/ 


\ 


\ 

1 

/  1  +  2  i  \ 
-3 

1 

[*-()■ 

te) 

p  +  i  \ 

2-  i  , 

1  -  i  \ 

—  3i  , 

(-1+  i  \ 
2  +  3  i 

/  V  0  / 

^  1  / 

V  i  7 

\  1  —  2  i  / 

C  +  2i  / 

3.6.27.  True  or  false:  The  set  of  complex  vectors  of  the  form  [  )  for  z  G  C  is  a  subspace  of  Cz. 


/n 

^  0  \ 

f-l  +  i  \ 

3.6.28.  (a)  Determine  whether  the  vectors  Vj^  = 

i 

»  V2  = 

1+  i 

>  v3  = 

1+  i  , 

\oJ 

^  2  ^ 

^  -1  / 

are  linearly  independent  or  linearly  dependent,  (b)  Do  they  form  a  basis  of  C3? 
(c)  Compute  the  Hermitian  norm  of  each  vector,  (d)  Compute  the  Hermitian  dot 
products  between  all  different  pairs.  Which  vectors  are  orthogonal? 


3.6.29.  Find  the  dimension  of  and  a  basis  for  the  following  subspaces  of  C3:  (a)  The  set  of  all 
complex  multiples  of  ( 1,  i ,  1  —  i  )T.  (b)  The  plane  z1  +  i  z2  +  (1  —  i  )z3  =  0.  (c)  The  image 

1  i  2  -  i 

2+  i  1  +  3i  -1  -  i 


of  the  matrix  A 


.  (d)  The  kernel  of  the  same  matrix,  (e)  The 


set  of  vectors  that  are  orthogonal  to  ( 1  —  i ,  2  i ,  1  +  i  )T . 

3.6.30.  Find  bases  for  the  four  fundamental  subspaces  associated  with  the  complex  matrices 

/  i  -1  2  -  i 


(a) 


i  2 
-1  2  i 


(b) 


>  -1  +  i  1  —  2  i 

-4  3  -  i  1  +  i 


(c) 


V 


—  1  +  2 i  -2-  i  3 
i  -1  1  +  i 
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3.6.31.  Prove  that  v  =  x  +  iy  and  v  =  x  —  iy  are  linearly  independent  complex  vectors  if  and 
only  if  their  real  and  imaginary  parts  x  and  y  are  linearly  independent  real  vectors. 

3.6.32.  Prove  that  the  space  of  complex  m  x  n  matrices  is  a  complex  vector  space.  What  is  its 
dimension? 

3.6.33.  Determine  which  of  the  following  are  subspaces  of  the  vector  space  consisting  of  all 
complex  2x2  matrices,  (a)  All  matrices  with  real  diagonals,  (b)  All  matrices  for  which 
the  sum  of  the  diagonal  entries  is  zero,  (c)  All  singular  complex  matrices,  (d)  All  matrices 

whose  determinant  is  real,  (e)  All  matrices  of  the  form  ^  ^  ,  where  a,  6  G  C. 

3.6.34.  True  or  false:  The  set  of  all  complex-valued  functions  u(x)  =  v(x)  +  i  w(pc)  with 
u( 0)  =  i  is  a  subspace  of  the  vector  space  of  complex- valued  functions. 

3.6.35.  Let  V  denote  the  complex  vector  space  spanned  by  the  functions  1,  elx  and  e~lx , 
where  x  is  a  real  variable.  Which  of  the  following  functions  belong  to  VI 

(a)  sinx,  (b)  cosx  — 2isinx,  (c)  coshx,  (d)  sin2^x,  (e)  cos2  x? 

3.6.36.  Prove  that  the  following  define  Hermitian  inner  products  on  C2: 

(a)  ( v  ,  w )  =  v1w1  2  v2w2 ,  (b)  ( v  ,  w  )  =  v1  w1  +  i  v1  w2  —  i  v2  w1  +  2v2w2. 

3.6.37.  Which  of  the  following  define  inner  products  on  C  ?  (a)  (v,w)  =  v1w1  +  2iv2w2: 

(b)  (v,w)  =  v1w1  +  2v2w2,  (c)  (v,w)  =  v1w2  +  v2w1,  (d)  (v,w>  = 

2v1  w1  -\-v1  w2  +x2  w1  -\-2v2w2:  (e)  ( v  ,  w  )  =  2v1  w1  +  (1+  i )  v1  w2  +  (1  —  i )  v2  w1  +  3x2  w2. 


m 

0  3.6.38.  Let  A  =  A  be  a  real  symmetric  n  x  n  matrix.  Show  that  (Av)  •  w  =  v  •  (Aw)  for  all 

v,  w  G 


i  n 


3.6.39.  Let  z  =  x  +  iy  G 


m 


(a)  Prove  that,  for  the  Hermitian  dot  product,  ||z||2  =  ||x  ||2  +  ||  y 

(b)  Does  this  formula  remain  valid  under  a  more  general  Hermitian  inner  product  on  Cn? 

0  3.6.40.  Let  V  be  a  complex  inner  product  space.  Prove  that,  for  all  z,w  G  V, 

(a)  ||  z  +  w  ||2  =  ||  z  ||2  +  2  Re  ( z  ,  w )  +  ||  w 


(b)  (z,w)  =  \  (||z  +  w 


z  —  w 


+  i  z  +  i  w 


z 


1  w 


0  3.6.41.  (a)  How  would  you  define  the  angle  between  two  elements  of  a  complex  inner  product 


space?  (b)  What  is  the  angle  between  (— 1,2  —  i ,  —  l  +  2i)T  and  (  —2  —  i ,  —  i ,  1  —  i  ) 
relative  to  the  Hermitian  dot  product? 

3.6.42.  Let  0/vG  Cn.  Which  scalar  multiples  cv  have  the  same  Hermitian  norm  as  v? 

0  3.6.43.  Prove  the  Cauchy-Schwarz  inequality  (3.103)  and  the  triangle  inequality  (3.104)  for  a 
general  complex  inner  product.  Hint:  Use  Exercises  3.6.8,  3.6.40(a). 

0  3.6.44.  The  Hermitian  adjoint  of  a  complex  m  x  n  matrix  A  is  the  complex  conjugate  of  its 

transpose,  written  A^  =  AT  =  AT . 

i  +  i  2i  \  ,  a  f  _  ( 1  i  -3 

-3  2-5iJ,th'  A  (  -2i  2  +  5 i 

(a)  (A^)^  =  A,  (b)  (zA  +  wB)^  =zA^+wB^  for  z,w  G  C,  (c)  (AB)^  =  A^ . 

0  3.6.45.  A  complex  matrix  H  is  called  Hermitian  if  it  equals  its  Hermitian  adjoint,  Hl  =  H, 
as  defined  in  the  preceding  exercise,  (a)  Prove  that  the  diagonal  entries  of  a  Hermitian 

matrix  are  real,  (b)  Prove  that  (Hz)  •  w  =  z  •  (H w)  for  z,  w  G  Cn.  (c)  Prove  that  every 
Hermitian  inner  product  on  Cn  has  the  form  ( z  ,  w )  =  zT HW,  where  H  is  an  n  x  n  positive 

definite  Hermitian  matrix,  (d)  How  would  you  verify  positive  definiteness  of  a  complex 
matrix? 


,T 


For  example,  if  A  = 


.  Prove  that 
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3.6.46.  Multiple  choice :  Let  V  be  a  complex  normed  vector  space.  How  many  unit  vectors  are 
parallel  to  a  given  vector  0/v  G  V?  (a)  none;  (b)  1;  (c)  2;  (d)  3;  (e)  oo;  (f)  depends 
upon  the  vector;  (g)  depends  on  the  norm.  Explain  your  answer. 

0  3.6.47.  Let  v1? . . . ,  v  be  elements  of  a  complex  inner  product  space.  Let  K  denote  the 
corresponding  n  x  n  Gram  matrix ,  defined  in  the  usual  manner. 

(a)  Prove  that  K  is  a  Hermitian  matrix,  as  defined  in  Exercise  3.6.45. 

(b)  Prove  that  K  is  positive  semi-definite,  meaning  zT  Kz  >  0  for  all  z  £  Cn. 

(c)  Prove  that  K  is  positive  definite  if  and  only  if  v1? . . . ,  vn  are  linearly  independent. 

3.6.48.  For  each  of  the  following  pairs  of  complex-valued  functions, 

(z)  compute  their  L2  norm  and  Hermitian  inner  product  on  the  interval  [0,1],  and  then 
(ii)  check  the  validity  of  the  Cauchy-Schwarz  and  triangle  inequalities. 

(a)  1,  e1 7rx;  (b)  x  +  i ,  x  —  i ;  (c)  i  x2,  (1  —  2  i  )x  +  3  i . 

3.6.49.  Formulate  conditions  on  a  weight  function  w(x)  that  guarantee  that  the  weighted 

rb  — 

integral  (/,$)=  /  f(x)  g(x)  w(x)  dx  defines  an  inner  product  on  the  space  of  continuous 

J  a 

complex-valued  functions  on  [a,  b]. 

3.6.50.  (a)  Formulate  a  general  definition  of  a  norm  on  a  complex  vector  space. 

(b)  How  would  you  define  analogues  of  the  L1,L2  and  L°°  norms  on  Cn? 


® 
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Chapter  4 
Orthogonality 


Orthogonality  is  the  mathematical  formalization  of  the  geometrical  property  of  perpendic¬ 
ularity,  as  adapted  to  general  inner  product  spaces.  In  linear  algebra,  bases  consisting  of 
mutually  orthogonal  elements  play  an  essential  role  in  theoretical  developments,  in  a  broad 
range  of  applications,  and  in  the  design  of  practical  numerical  algorithms.  Computations 
become  dramatically  simpler  and  less  prone  to  numerical  instabilities  when  performed  in 
orthogonal  coordinate  systems.  Indeed,  many  large-scale  modern  applications  would  be 
impractical,  if  not  completely  infeasible,  were  it  not  for  the  dramatic  simplifying  power  of 
orthogonality. 

The  duly  famous  Gram-Schmidt  process  will  convert  an  arbitrary  basis  of  an  inner 
product  space  into  an  orthogonal  basis.  In  Euclidean  space,  the  Gram-Schmidt  process  can 
be  reinterpreted  as  a  new  kind  of  matrix  factorization,  in  which  a  nonsingular  matrix  A  = 
Q  R  is  written  as  the  product  of  an  orthogonal  matrix  Q  and  an  upper  triangular  matrix  R. 
The  Q  R  factorization  and  its  generalizations  are  used  in  statistical  data  analysis  as  well  as 
the  design  of  numerical  algorithms  for  computing  eigenvalues  and  eigenvectors.  In  function 
space,  the  Gram-Schmidt  algorithm  is  employed  to  construct  orthogonal  polynomials  and 
other  useful  systems  of  orthogonal  functions. 

Orthogonality  is  motivated  by  geometry,  and  orthogonal  matrices,  meaning  those  whose 
columns  form  an  orthonormal  system,  are  of  fundamental  importance  in  the  mathemat¬ 
ics  of  symmetry,  in  image  processing,  and  in  computer  graphics,  animation,  and  cinema, 
5, 12,  72,  73].  The  orthogonal  projection  of  a  point  onto  a  subspace  turns  out  to  be  the 
closest  point  or  least  squares  minimizer,  as  we  discuss  in  Chapter  5.  Yet  another  important 
fact  is  that  the  four  fundamental  subspaces  of  a  matrix  that  were  introduced  in  Chapter  2 
come  in  mutually  orthogonal  pairs.  This  observation  leads  directly  to  a  new  characteri¬ 
zation  of  the  compatibility  conditions  for  linear  algebraic  systems  known  as  the  Fredholm 
alternative,  whose  extensions  are  used  in  the  analysis  of  linear  boundary  value  problems, 
differential  equations,  and  integral  equations,  [16,61].  The  orthogonality  of  eigenvector 
and  eigenfunction  bases  for  symmetric  matrices  and  self-adjoint  operators  provides  the  key 
to  understanding  the  dynamics  of  discrete  and  continuous  mechanical,  thermodynamical, 
electrical,  and  quantum  mechanical  systems. 

One  of  the  most  fertile  applications  of  orthogonal  bases  is  in  signal  processing.  Fourier 
analysis  decomposes  a  signal  into  its  simple  periodic  components  —  sines  and  cosines 

which  form  an  orthogonal  system  of  functions,  [61,  77].  Modern  digital  media,  such  as 
CD’s,  DVD’s  and  MP3’s,  are  based  on  discrete  data  obtained  by  sampling  a  physical  signal. 
The  Discrete  Fourier  Transform  (DFT)  uses  orthogonality  to  decompose  the  sampled  signal 
vector  into  a  linear  combination  of  sampled  trigonometric  functions  (or,  more  accurately, 
complex  exponentials).  Basic  data  compression  and  noise  removal  algorithms  are  applied  to 
the  discrete  Fourier  coefficients,  acting  on  the  observation  that  noise  tends  to  accumulate 
in  the  high-frequency  Fourier  modes.  More  sophisticated  signal  and  image  processing 
techniques,  including  smoothing  and  compression  algorithms,  are  based  on  orthogonal 
wavelet  bases,  which  are  discussed  in  Section  9.7. 
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4  Orthogonality 


Figure  4.1.  Orthonormal  Bases  in  M2  and  M3. 


4.1  Orthogonal  and  Orthonormal  Bases 

Let  V  be  a  real"*"  inner  product  space.  Recall  that  two  elements  v,  w  e  V  are  called 
orthogonal  if  their  inner  product  vanishes:  ( v  ,  w )  =0.  In  the  case  of  vectors  in  Euclidean 
space,  orthogonality  under  the  dot  product  means  that  they  meet  at  a  right  angle. 

A  particularly  important  configuration  arises  when  V  admits  a  basis  consisting  of  mu¬ 
tually  orthogonal  elements. 


Definition  4.1.  A  basis  u 


orthogonal  if 


n 


un  of  an  n-dimensional  inner  product  space  V  is  called 


u  • ,  u . 


=  0  for  all  i  ^  j.  The  basis  is  called  orthonormal  if,  in  addition, 


each  vector  has  unit  length: 


u 


=  1,  for  all  i  =  1, . . . ,  n. 


For  the  Euclidean  space  Mn  equipped  with  the  standard  dot  product,  the  simplest 
example  of  an  orthonormal  basis  is  the  standard  basis 


(l\  /°\ 


0 

i 

el  = 

0 

5  e2  — 

0 

0 

0 

\0/ 

w 

f°\ 

0 

0 


0 

V  i  / 


Orthogonality  follows  because  ei  •  e  •  =0,  for  i  ^  j,  while  ||  ej|  =  1  implies  normality. 

Since  a  basis  cannot  contain  the  zero  vector,  there  is  an  easy  way  to  convert  an  orthogo¬ 
nal  basis  to  an  orthonormal  basis.  Namely,  we  replace  each  basis  vector  with  a  unit  vector 
pointing  in  the  same  direction,  as  in  Lemma  3.14. 


15 


vn  is  an  orthogonal  basis  of  a  vector  space  V,  then  the  normalized 


Lemma  4.2.  If  v 

vectors  =  v^/||  vi  ||,  i  =  1, . . . ,  n,  form  an  orthonormal  basis. 


^  The  methods  can  be  adapted  more  or  less  straightforwardly  to  the  complex  realm.  The  main 
complication,  as  noted  in  Section  3.6,  is  that  we  need  to  be  careful  with  the  order  of  vectors 
appearing  in  the  conjugate  symmetric  complex  inner  products.  In  this  chapter,  we  will  be  careful 
to  write  the  inner  product  formulas  in  the  proper  order  so  that  they  retain  their  validity  in  complex 
vector  spaces. 
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Example  4.3.  The  vectors 


are  easily  seen  to  form  a  basis  of  M3.  Moreover,  they  are  mutually  perpendicular,  vx  •  v2  = 
vi  '  v3  =  v2  '  v3  =  and  so  form  an  orthogonal  basis  with  respect  to  the  standard  dot 
product  on  M3.  When  we  divide  each  orthogonal  basis  vector  by  its  length,  the  result  is 
the  orthonormal  basis 


satisfying  u 


u2  =  U1 


U3  =  U2 


u3  =  0  and 


u 


u2  II  =  ||  Uo  ||  =  1.  The  appearance 


of  square  roots  in  the  elements  of  an  orthonormal  basis  is  fairly  typical. 


A  useful  observation  is  that  every  orthogonal  collection  of  nonzero  vectors  is  automati¬ 
cally  linearly  independent. 

Proposition  4.4.  Let  v1? . . . ,  vfc  G  V  be  nonzero,  mutually  orthogonal  elements,  so  ^  0 
and  ( ,  v  • )  =0  for  all  i  ^  j.  Then  v1? . . . ,  vfc  are  linearly  independent. 


Proof :  Suppose 


C1V1  +  •••  +cfcvfc  =  0. 


Let  us  take  the  inner  product  of  this  equation  with  any  \i.  Using  linearity  of  the  inner 
product  and  orthogonality,  we  compute 


0  —  (civi  +  •••  +cfcvfc,vi)=c1(v1)vi)+  •••  +ck{vk,Vt)  =ci(vi,vi)  =ct 

Therefore,  given  that  wi  0,  we  conclude  that  ci  =  0.  Since  this  holds  for  all  i  =  1 
the  linear  independence  of  v1? . . . ,  vfc  follows. 


Q.E.D. 


As  a  direct  corollary,  we  infer  that  every  collection  of  nonzero  orthogonal  vectors  forms 
a  basis  for  its  span. 

Theorem  4.5.  Suppose  v1,...,vn  E  V  are  nonzero,  mutually  orthogonal  elements  of 
an  inner  product  space  V.  Then  v1,...,vn  form  an  orthogonal  basis  for  their  span 
W  =  span  {v1,...,vn}  C  V,  which  is  therefore  a  subspace  of  dimension  n  =  dim W. 
In  particular,  if  dimU  =  n,  then  v1? . . . ,  vn  form  a  orthogonal  basis  for  V. 


Orthogonality  is  also  of  profound  significance  for  function  spaces.  Here  is  a  relatively 
simple  example. 

Example  4.6.  Consider  the  vector  space  V ^  consisting  of  all  quadratic  polynomials 
p(x)  =  a  +  (3  x  +  yx2,  equipped  with  the  L2  inner  product  and  norm 


{p,q)=  /  p(x)  q(x)  dx , 

Jo 


/  f1 

p  =  V(p,p )  =  \ 

/  /  p(x)2  dx 
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The  standard  monomials  l,x,x2  do  not  form  an  orthogonal  basis.  Indeed, 


l.x) 


l,x2)  =  k. 


ry>  /y.2  \  -  _1 

/  A 


One  orthogonal  basis  of  V 1 2 is  provided  by  following  polynomials: 


Pi(x)  =  1, 


P2(x)  =x~b 


p3(  x)  =  x2  —  X  +  i 


(4.1) 


Indeed,  one  easily  verifies  that  (pt  pp2)  =  {P\  ,Ps }  =  ( P2  ■  P3 )  =0,  while 


1  _  1 
VT2  _  24’ 


1  _  1 
yi80  _  6V5  ' 


The  corresponding  orthonormal  basis  is  found  by  dividing  each  orthogonal  basis  element 
by  its  norm: 


ui(x)  =  1,  u2(x)  =  V%  (2x  —  1) ,  us(x)  =  V&  ( 6x2  —  6x  +  1 )  . 

In  Section  4.5  below,  we  will  learn  how  to  systematically  construct  such  orthogonal  systems 
of  polynomials. 


Exercises 


4.1.1.  Let  R  have  the  standard  dot  product.  Classify  the  following  pairs  of  vectors  as 
(i)  basis,  (ii)  orthogonal  basis,  and/or  (in)  ort honor mal  basis: 

/  J_\  /  _  J_ 


(a)  vi  =  (  2)  ’  v2  =  (l)  ;  (b)  vi  = 


0)  vr  =  (  3)  ,  v2  =  (_g  )  ;  (e)  vx  = 


V7 2/ 

-1 


>  v2 


0  I  ’  V2  - 


0 

3 


S);(c)  Vi=(:0’ 


2 

2 


;  (f)  vi  = 


-  1  5 

4 

5 


»  v2  = 


4 

5 
3 
5 


•j 

4.1.2.  Let  R  have  the  standard  dot  product.  Classify  the  following  sets  of  vectors  as 


( i )  basis,  (m)  orthogonal  basis,  and/or  ( in )  orthonormal  basis: 


i\  /0\ 
-1  ,  1 
1/  V1/ 


;  (0 


13 
3 
5 
48 


/  12  \ 

13 

0 


/1\ 


13 

4 

5 

36 


/ 


;  (c) 


\ _ /  \  22  / 

\  65  /  V  13  7  V  65  / 


V 


o\ 

1 

C2 

1 

72/ 


/  _  J-\ 

72 

0 

V  75  / 


/  J_\ 

72 

1 

72 

V  0/ 


4.1.3.  Repeat  Exercise  4.1.1,  but  use  the  weighted  inner  product  (v,w)  =  v1w1  +  \V2W2 
instead  of  the  dot  product. 

4.1.4.  Show  that  the  standard  basis  vectors  e1,e2,e3  form  an  orthogonal  basis  with  respect  to 

Q 

the  weighted  inner  product  ( v  ,  w  )  =  2  v2w2  +  3  w 3  on  R  .  Find  an  orthonormal 

basis  for  this  inner  product  space. 

/ a\  f  —  a 

T  l  1 

r\ 

R  under  (a)  the  dot  product;  (b)  the  weighted  inner  product  (v,w)  =  3 v1w1  +  2v2w2] 

2  -1 
-1  3 


4.1.5.  Find  all  values  of  a  such  that  the  vectors 


form  an  orthogonal  basis  of 


(c)  the  inner  product  prescribed  by  the  positive  definite  matrix  K  = 


4.1.6.  Find  all  possible  values  of  a  and  b  in  the  inner  product  ( v ,  w)  =  av-^w^  +  bv2w2  that 


rj~t  2 

make  the  vectors  (1,2)  ,(—1,1)  ,an  orthogonal  basis  in  R  . 

4.1.7.  Answer  Exercise  4.1.6  for  the  vectors  (a)  (  2,  3  )T,  (  — 2,  2  )T;  (b)  ( 1, 4  )T,  ( 2, 1  )T . 


T 
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4.1.8.  Find  an  inner  product  such  that  the  vectors  (—1,2)  and  ( 1,  2  )1  form  an  orthonormal 
basis  of  R2. 


T 


O 

4.1.9.  True  or  false:  If  v1?  v2,  v3  are  a  basis  for  R  ,  then  they  form  an  orthogonal  basis  under 
some  appropriately  weighted  inner  product  ( v  ,  w  )  =  av  1  w1  +  bv2  vu2  +  cr3  re3. 

o 

O  4.1.10.  The  cross  product  between  two  vectors  in  R  is  the  vector  defined  by  the  formula 


V  X  w  = 

(  V2W3  -  V3W2  \ 

V‘^w1  —  vpw^  I .  where  v  = 

(  v  i  ^ 

,  w  = 

(  W1  ^ 
w2 

(4.2) 

\W1«'2  -V2wl  ) 

\v3/ 

\w3/ 

(a)  Show  that  u  =  v  x  w  is  orthogonal,  under  the  dot  product,  to  both  v  and  w. 

Q 

(b)  Show  that  v  x  w  =  0  if  and  only  if  v  and  w  are  parallel,  (c)  Prove  that  if  v,w  £  R 

o 

are  orthogonal  nonzero  vectors,  then  u  =  v  x  w,  v,  w  form  an  orthogonal  basis  of  R  . 

(d)  True  or  false:  If  v,w  £  Rd  are  orthogonal  unit  vectors,  then  v,  w  and  u  =  v  x  w  form 

o 

an  orthonormal  basis  of  R  . 


o 

0  4.1.11.  Prove  that  every  orthonormal  basis  of  R  under  the  standard  dot  product  has  the  form 


u 


cos  0 
sin  0 


and  u2  =  d= 


sin  6 
cos 


q  J  for  some  0  <  0  <  2 it  and  some  choice  of  =b  sign. 

(  cos  if  cos  p>  —  cos  0  sin  p>  sin  if 


0  4.1.12.  Given  angles  9,p,if,  prove  that  the  vectors  rq  = 


—  sin  if  cos  cp  —  cos  0  sin  p>  cos  if 
\  sin0sin(/? 


Ur 


(  cos  if  sin  p>  +  cos  0  cos  p>  sin  if  \ 
sin  if  sin  <p  +  cos  0  cos  (p  cos  if 
—  sin  0  cos  p> 


\ 


u. 


j 


/  sin  0  sirup 
sin  0  cos  ( p 
\  cos  0 


form  an  orthonormal  basis 


Q 

of  R  under  the  standard  dot  product.  Remark.  It  can  be  proved,  [31;  p.  147],  that  every 

o 

orthonormal  basis  of  R  has  the  form  u1?  u2,  ±u3  for  some  choice  of  angles  9,p,if. 

G  4.1.13.  (a)  Show  that  vl5 . . . ,  v  form  an  orthonormal  basis  of  Rn  for  the  inner  product 
( v  ,  w )  =  v  if  w  for  K  >  0  if  and  only  if  A  K  A  =  I ,  where  A  =  ( v2  . . .  vn  ). 

(b)  Prove  that  every  basis  of  Rn  is  an  orthonormal  basis  with  respect  to  some  inner 
product.  Is  the  inner  product  uniquely  determined?  (c)  Find  the  inner  product  on  R  that 


makes  v1  =  (l,l)  ,v2  =  (2,3)  into  an  orthonormal  basis,  (d)  Find  the  inner  product 
on 


T  / o  o 

o  fjf1  rJf1  rJfi 

that  makes  v1  =  (1,1,1)  ,  v2  =  (1,1,2)  ,v3  =  (l,2,3)  an  orthonormal  basis. 

9 

for  the  inner  products 
( b )  (v,w>  =  vT 


4.1.14.  Describe  all  orthonormal  bases  of 

T  l  1  0 


(a)  (v,w)  =  v- 


0  2 


w; 


1 

-1 


1 

2 


w. 


4.1.15.  Let  v  and  w  be  elements  of  an  inner  product  space.  Prove  that 


v  +  w 


vll  +  ||  w  || 2  if  and  only  if  v,w  are  orthogonal.  Explain  why  this  formula  can 


be  viewed  as  the  generalization  of  the  Pythagorean  Theorem. 


4.1.16.  Prove  that  if  v1?v2  form  a  basis  of  an  inner  product  space  V  and 
vi  +  v2  and  vi  —  v2  form  an  orthogonal  basis  of  V. 


then 


4.1.17.  Suppose  vl5 . . . ,  vfc  are  nonzero  mutually  orthogonal  elements  of  an  inner  product  space 
V.  Write  down  their  Gram  matrix.  Why  is  it  nonsingular? 


4.1.18.  Let  V  =  V ^  be  the  vector  space  consisting  of  linear  polynomials  p(t)  =  at  +  b. 
(a)  Carefully  explain  why  (p,q)  =  J  tp(t )  q(t )  dt  defines  an  inner  product  on  V. 


(b)  Find  all  polynomials  p(t)  =  at  +  b  £  V  that  are  orthogonal  to  Pi(t)  =  1  based  on 
this  inner  product,  (c)  Use  part  (b)  to  construct  an  orthonormal  basis  of  V  for  this  inner 
product,  (d)  Find  an  orthonormal  basis  of  the  space  of  quadratic  polynomials  for  the 
same  inner  product.  Hint:  First  find  a  quadratic  polynomial  that  is  orthogonal  to  the  basis 
you  constructed  in  part  (c). 
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4.1.19.  Explain  why  the  functions  cosx,  sinx  form  an  orthogonal  basis  for  the  space  of 
solutions  to  the  differential  equation  y,r  +  y  =  0  under  the  L2  inner  product  on  [  —  7r,  n]. 

4.1.20.  Do  the  functions  ex^ ,  e~  x^  form  an  orthogonal  basis  for  the  space  of  solutions  to  the 
differential  equation  4 y"  —  y  =  0  under  the  L2  inner  product  on  [0, 1]?  If  not,  can  you  find 
an  orthogonal  basis  of  the  solution  space? 


Computations  in  Orthogonal  Bases 

What  are  the  advantages  of  orthogonal  and  orthonormal  bases?  Once  one  has  a  basis  of 
a  vector  space,  a  key  issue  is  how  to  express  other  elements  as  linear  combinations  of  the 
basis  elements  —  that  is,  to  find  their  coordinates  in  the  prescribed  basis.  In  general,  this 
is  not  so  easy,  since  it  requires  solving  a  system  of  linear  equations,  as  described  in  (2.23). 
In  high-dimensional  situations  arising  in  applications,  computing  the  solution  may  require 
a  considerable,  if  not  infeasible,  amount  of  time  and  effort. 

However,  if  the  basis  is  orthogonal,  or,  even  better,  orthonormal,  then  the  change  of  basis 
computation  requires  almost  no  work.  This  is  the  crucial  insight  underlying  the  efficacy  of 
both  discrete  and  continuous  Fourier  analysis  in  signal,  image,  and  video  processing,  least 
squares  approximations,  the  statistical  analysis  of  large  data  sets,  and  a  multitude  of  other 
applications,  both  classical  and  modern. 

Theorem  4.7.  Let  ul7...,un  be  an  orthonormal  basis  for  an  inner  product  space  V. 
Then  one  can  write  any  element  v  E  V  as  a  linear  combination 

V  =  C1U1+  •••  +CnUn>  (4-3) 


in  which  its  coordinates 


U:  =  v,u?:  , 


i  —  1, . . . ,  n, 


(4.4) 


are  explicitly  given  as  inner  products.  Moreover,  its  norm  is  given  by  the  Pythagorean 
formula 


=  \/cf  + 


n 


+  4  =  a  El  (v’u42  , 

\  i=i 


(4.5) 


namely,  the  square  root  of  the  sum  of  the  squares  of  its  orthonormal  basis  coordinates. 

Proof :  Let  us  compute  the  inner  product  of  the  element  (4.3)  with  one  of  the  basis  vectors. 
Using  the  orthonormality  conditions 

i  ±  3, 


( u*  > u,  >  = 


0 


3'  I  l 

and  bilinearity  of  the  inner  product,  we  obtain 


1  —  3^ 


(4.6) 


n 


n 


v .  ui )  =  ( E  ci  uj  ’  ui )  =  E  cj  ( uj  -  ui )  = 

\j= 1  /  3= 1 

To  prove  formula  (4.5),  we  similarly  expand 


=  CA 


n 


n 


n 


n 


c- 2 


:  =  <v’v>  =  (  E  ciui>  E  ciu3 )  =  E  cici(Ui,up  =  E 

\j  =  1  3  =  1  /  i,j  =  1  i  =  l 

again  making  use  of  the  orthonormality  of  the  basis  elements.  Q.E.D. 
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It  is  worth  emphasizing  that  the  Pythagorean-type  formula  (4.5)  is  valid  for  all  inner 
products. 


T 

Example  4.8.  Let  us  rewrite  the  vector  v  =  (1,1,1)  in  terms  of  the  orthonormal 
basis 


(  o\ 

/  5  \ 

a/30 

U1  = 

2 

V6 

,  u2  = 

1 

V5 

,  u3  = 

2 

vTo 

\  73/ 

,  2 

\V5/ 

\  v/30  / 

constructed  in  Example  4.3.  Computing  the  dot  products 


v  • 


2 

7T 


v  •  u2 


3 

75’ 


v  •  u3 


4 

\/30  ’ 


we  immediately  conclude  that 

v 


^U'  +  ^U2  +  ^5U3' 


Needless  to  say,  a  direct  computation  based  on  solving  the  associated  linear  system,  as  in 
Chapter  2,  is  more  tedious. 

While  passage  from  an  orthogonal  basis  to  its  orthonormal  version  is  elementary  —  one 
simply  divides  each  basis  element  by  its  norm  —  we  shall  often  find  it  more  convenient  to 
work  directly  with  the  unnormalized  version.  The  next  result  provides  the  corresponding 
formula  expressing  a  vector  in  terms  of  an  orthogonal,  but  not  necessarily  orthonormal 
basis.  The  proof  proceeds  exactly  as  in  the  orthonormal  case,  and  details  are  left  to  the 
reader. 


Theorem  4.9.  If  v1? . . . ,  vn  form  an  orthogonal  basis,  then  the  corresponding  coordinates 
of  a  vector 


v  =  ai  vi  + 


+  anWn 


are  given  by 


ai  = 


(v>vi) 


(4.7) 


In  this  case,  its  norm  can  be  computed  using  the  formula 


n 


n 


=  E 

i—  1 


a? 


V- 


=  E 

i—  1 


(v,vO 


V- 


(4.8) 


Equation  (4.7),  along  with  its  orthonormal  simplification  (4.4),  is  one  of  the  most  useful 
formulas  we  shall  establish,  and  applications  will  appear  repeatedly  throughout  this  text 
and  beyond. 

Example  4.10.  The  wavelet  basis 


(l\ 

1 

0 

V1  = 

1 

V  i  / 

< 

to 

-l 

1  \ 

(  °\ 

-1 

0 

V3  = 

0 

.  v4  = 

1 

0/ 

\— 1/ 

introduced  in  Example  2.35  is,  in  fact,  an  orthogonal  basis 


of  M4.  The  norms  are 
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Therefore,  using  (4.7),  we  can  readily  express  any  vector  as  a  linear  combination  of  the 
wavelet  basis  vectors.  For  example, 


v  — 


4\ 

-2 

1 

V  5  J 


=  2  v1  -  v2  +  3  Vo  -  2  v4, 


where  the  wavelet  coordinates  are  computed  directly  by 


v,vx 


_  _  8  _  (v,v2 

2  A  Z’ 


Zi  _  _i  (v’vs)  _  6  _ 

a  T  o  o 


(v,v4)  -4 


=  -2, 


This  is  clearly  quicker  than  solving  the  linear  system,  as  we  did  earlier  in  Example  2.35, 
Finally,  we  note  that 


46  = 


=  2' 


+  (-i): 


+  3' 


+  (-2): 


=  4-  4  +  l-  4  +  9-  2  +  4-2, 


in  conformity  with  (4.8). 


Example  4.11.  The  same  formulas  are  equally  valid  for  orthogonal  bases  in  function 
spaces.  For  example,  to  express  a  quadratic  polynomial 

p{x)  =  C^^x)  +  C2p2(x)  +  C3p3(x)  =  c1+c2{x-\)+cz(  x2  -x+l) 
in  terms  of  the  orthogonal  basis  (4.1),  we  merely  compute  the  L2  inner  product  integrals 


ci  = 


P,P  i 


Pi 


—  /  p(x)  dx , 

Jo 


C2  = 


P,P2 


C3  = 


P2 


=  12  p(x)  ( x  —  \  )  dx, 


o 


P,Ps 


Ps 


=  180 


/  p(x)  ( x2  —  x  +  4  )  dx% 

Jo 


Thus,  for  example,  the  coefficients  for  p{x)  =  x2  +  x  +  1  are 


ci  = 


/  ( x 2  +  x  +  1)  dx  =  41, 

Jo 


c2  ~ 


12  f  ( x 2  +  x  +  l)(x— l)dx  =  2, 

Jo 


and  so 


c3  =  180  f  {x2  +  x  +  1)  ( x2  —  x  +  1 )  dx  =  1, 

Jo 


p{x)  =  x2+x  +  l  =  ^-+2(x— l)  +  (x2  —  x+1) 


Example  4.12.  Perhaps  the  most  important  example  of  an  orthogonal  basis  is  provided 

by  the  basic  trigonometric  functions.  Let  T ^  denote  the  vector  space  consisting  of  all 
trigonometric  polynomials 

T(x)  =  (sinx)J  (cosx)fc  (4.10) 

0  <  j + k  <  n 

of  degree  <  n.  The  individual  monomials  (sinx)-7  (cosx)fc  span  T^n\  but,  as  we  saw  in 
Example  2.20,  they  do  not  form  a  basis,  owing  to  identities  stemming  from  the  basic 
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trigonometric  formula  cos2  x  +  sin2  x  —  1.  Exercise  3.6.21  introduced  a  more  convenient 
spanning  set  consisting  of  the  2  n  +  1  functions 

1,  cosx,  sinx,  cos2x,  sin2x,  ...  cos  nx,  sinnx.  (4.11) 

Let  us  prove  that  these  functions  form  an  orthogonal  basis  of  T ^  with  respect  to  the  L2 


inner  product  and  norm: 

/7 r 

f(x)g(x)dx, 

-7 T 

The  elementary  integration  formulas 


/7T 

fix) 

-7 r 


dx. 


(4.12) 


*7T 


*7T 


cos  k x  cos  lx  dx  = 


-7T 


0,  fc  7^  Z, 

2  7T,  k  =  l  =  0, 

7T,  fc  =  l  7^  0, 


sin  k  x  sin  Ixdx  — 


—  TV 


0,  k  ^  l, 

7T,  k  =  l  7^  0, 


*7T 


coskx  sin  Ixdx  —  0, 


(4.13) 


-7T 


which  are  valid  for  all  nonnegative  integers  M>o,  imply  the  orthogonality  relations 

( cos  kx  ,  cos  lx  )  =  ( sin  /ex  ,  sin  lx )  =  0,  /c  7^  Z,  ( cos  /ex  ,  sin  lx  )  =  0, 

1  =  \/2tt. 


coskx 


sin  kx 


=  V71"  5 


&  7^  0, 


(4.14) 


Theorem  4.5  now  assures  us  that  the  functions  (4.11)  form  a  basis  for  One  conse¬ 

quence  is  that  dimT^  =  2n  +  1  —  a  fact  that  is  not  so  easy  to  establish  directly. 

Orthogonality  of  the  trigonometric  functions  (4.11)  means  that  we  can  compute  the 
coefficients  a0, . . . ,  an,  b1: . . . ,  bn  of  any  trigonometric  polynomial 


n 


p(x)  —  a0  +  ( ak  cos  k  x  +  bk  sin  k  x ) 


(4.15) 


A;  =  1 


by  an  explicit  integration  formula.  Namely, 


ao  — 


f,  1) 


1 


•7T 


27T 


/(x)  dx, 

T 

(  /  ,  sin  kx  )  1 

llsinA)xll2  7T 


afc  — 


(f,coskx)  1 

1  7T 


*7T 


*7T 


—  7T 


||  coskx 
f(x)  sin  kx  dx, 


f(x)  coskx  dx, 


■7 r 


(4.16) 


k  >  1 


These  fundamental  formulas  play  an  essential  role  in  the  theory  and  applications  of  Fourier 
series,  [61,  79,  77] 


Exercises 


T  4.1.21.  (a)  Prove  that  the  vectors  vx  =  ( 1, 1, 1  )T  ,  v2  =  ( 1, 1,  -2  )1  ,  v3  =  (  -1, 1,  0  )J  ,  form 

Q 

an  orthogonal  basis  of  R  with  the  dot  product,  (b)  Use  orthogonality  to  write  the  vector 
v  =  ( 1,  2,  3  )T  as  a  linear  combination  of  vl5  v2,  v3.  (c)  Verify  the  formula  (4.8)  for 
(d)  Construct  an  orthonormal  basis,  using  the  given  vectors,  (e)  Write  v  as  a  linear 
combination  of  the  orthonormal  basis,  and  verify  (4.5). 


,T 


T 


£  12  1 
13’  13’  13 


T 


48 

65’ 


Vo  =  —  er,—  To,7F 


_5_  36 
13’  65 


T 


4.1.22.  (a)  Prove  that  vx  =  (  |,0,  |)  ,  v2  =  ^ -j5,  f|,  ^  j  ,  v3 

o 

form  an  orthonormal  basis  for  M  for  the  usual  dot  product,  (b)  Find  the  coordinates  of 


v  =  ( 1, 1, 1 )  relative  to  this  basis,  (c)  Verify  formula  (4.5)  in  this  particular  case. 
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2  (2—1 
4.1.23.  Let  R  have  the  inner  product  defined  by  the  positive  definite  matrix  K  =  f  ^  ^ 

(a)  Show  that  v1  =  (l,l)  ,  v2  =  (  —  2, 1 )  form  an  orthogonal  basis,  (b)  Write  the 

vector  v  =  (3,2)T  as  a  linear  combination  of  v1?v2  using  the  orthogonality  formula  (4.7). 
(c)  Verify  the  formula  (4.8)  for  ||v||.  (d)  Find  an  orthonormal  basis  u1?u2  for  this  inner 
product,  (e)  Write  v  as  a  linear  combination  of  the  orthonormal  basis,  and  verify  (4.5). 

0  4.1.24.  (a)  Let  u1? . . . ,  un  be  an  orthonormal  basis  of  a  finite-dimensional  inner  product  space  V. 
Let  v  =  c1u1  +  •  •  •  +  cn un  and  w  =  d1u1  +  •  •  •  +  dn un  be  any  two  elements  of  V. 

Prove  that  ( v  ,  w )  =  c1d1  +  •  •  •  +  cndn. 

(b)  Write  down  the  corresponding  inner  product  formula  for  an  orthogonal  basis. 

4.1.25.  Find  an  example  that  demonstrates  why  equation  (4.5)  is  not  valid  for  a  non¬ 
orthonormal  basis. 

4.1.26.  Use  orthogonality  to  write  the  polynomials  l,x  and  x2  as  linear  combinations  of  the 
orthogonal  basis  (4.1). 


3  t 

5  L 


4.1.27.  (a)  Prove  that  the  polynomials  P0(t)  =  1,  P-±(t)  =  t,  P2(t)  =  t2  —  ^  ,  P3(t)  =  £3 
form  an  orthogonal  basis  for  the  vector  space  pG>)  of  cubic  polynomials  for  the  L2  inner 

product  (/  ,g)  =  J  f(t)g(t)dt.  (b)  Find  an  orthonormal  basis  of  .  (c)  Write  £3  as 
a  linear  combination  of  P0,  Pl5  P2,  P3  using  the  orthogonal  basis  formula  (4.7). 

4.1.28.  (a)  Prove  that  the  polynomials  P0(t)  =  1,  Pj_(t)  =  t  —  P2  (t)  =  t2  —  f  t  +  form  an 
orthogonal  basis  for  V ^  with  respect  to  the  weighted  inner  product 

(  /  ,  g )  =  f(t )  g(t)  tdt.  (b)  Find  the  corresponding  orthonormal  basis. 

(c)  Write  £2  as  a  linear  combination  of  P0,  P1?  P2  using  the  orthogonal  basis  formula  (4.7). 

4.1.29.  Write  the  following  trigonometric  polynomials  in  terms  of  the  basis  functions  (4.11): 

O  q  O  Q  A 

(a)  cos  x,  (b)  cos  x  sin x,  (c)  sin  x,  (d)  cos  x  sin  x,  (e)  cos  x. 

Hint :  You  can  use  complex  exponentials  to  simplify  the  inner  product  integrals. 

4.1.30.  Write  down  an  orthonormal  basis  of  the  space  of  trigonometric  polynomials  with 

2  r 

respect  to  the  L  inner  product  (f,g)=  /  f(x)  g(pc)  dx. 

J  —7 r 


0  4.1.31.  Show  that  the  2n  +  1  complex  exponentials  e1  kx  for  k  =  —  n,  —  n  +  1, . . . ,  —  1,  0, 1, . . . ,  n, 
form  an  orthonormal  basis  for  the  space  of  complex- valued  trigonometric  polynomials  under 

1  r 77  - 

the  Hermitian  inner  product  (/  ,g)  =  —  /  f(pc)g(x)dx. 

2  7T  J—  7T 

0  4.1.32.  Prove  the  trigonometric  integral  identities  (4.13).  Hint :  You  can  either  use  a 

trigonometric  summation  identity,  or,  if  you  can’t  remember  the  right  one,  use  Euler’s 
formula  (3.94)  to  rewrite  sine  and  cosine  as  combinations  of  complex  exponentials. 

0  4.1.33.  Fill  in  the  complete  details  of  the  proof  of  Theorem  4.9. 


4.2  The  Gram— Schmidt  Process 

Once  we  become  convinced  of  the  utility  of  orthogonal  and  orthonormal  bases,  a  natural 
question  arises:  How  can  we  construct  them?  A  practical  algorithm  was  first  discovered 
by  the  French  mathematician  Pierre-Simon  Laplace  in  the  eighteenth  century.  Today  the 
algorithm  is  known  as  the  Gram-Schmidt  process ,  after  its  rediscovery  by  Gram,  whom 
we  already  met  in  Chapter  3,  and  the  twentieth-century  German  mathematician  Erhard 
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Schmidt.  The  Gram-Schmidt  process  is  one  of  the  premier  algorithms  of  applied  and 
computational  linear  algebra. 

Let  W  denote  a  finite-dimensional  inner  product  space.  (To  begin  with,  you  might  wish 
to  think  of  W  as  a  subspace  of  Mm,  equipped  with  the  standard  Euclidean  dot  product, 
although  the  algorithm  will  be  formulated  in  complete  generality.)  We  assume  that  we 
already  know  some  basis  w1? . . . ,  wn  of  W,  where  n  —  dimkE.  Our  goal  is  to  use  this 
information  to  construct  an  orthogonal  basis  v1? . . . ,  vn. 

We  will  construct  the  orthogonal  basis  elements  one  by  one.  Since  initially  we  are  not 
worrying  about  normality,  there  are  no  conditions  on  the  first  orthogonal  basis  element  v1? 
and  so  there  is  no  harm  in  choosing 

vi  =  wi  • 

Note  that  v:  ^  0,  since  wy  appears  in  the  original  basis.  Starting  with  w2,  the  second 
basis  vector  v2  must  be  orthogonal  to  the  first:  ( v2  ,  vx )  =  0.  Let  us  try  to  arrange  this 
by  subtracting  a  suitable  multiple  of  v1?  and  set 


V2  =  W2  -  CV1> 


where  c  is  a  scalar  to  be  determined.  The  orthogonality  condition 


0  =  (V2  >V1 


w. 


Ti 


viTi 


=  ( 


w2  ,  vx 


requires  that  c 


and  therefore 


v 


2 


(4.17) 


Linear  independence  of  v:  =  wy  and  w2  ensures  that  v2  ^  0.  (Check!) 
Next,  we  construct 


v3  =  w3  -  c1V1  -  C2V2 

by  subtracting  suitable  multiples  of  the  first  two  orthogonal  basis  elements  from  w3.  We 
want  v3  to  be  orthogonal  to  both  v:  and  v2.  Since  we  already  arranged  that  ( v: ,  v2  )  =  0, 
this  requires 


0  =  (V3>V1  )  =  (W3>V1  )  -  C1  (V1  >V1  )- 

0  =  (  v3  -  V2  )  =  (  W3  -  V2  )  -  C2  (  V2  -  V2  )- 

and  hence 

(  W3  >  V2  > 

(  W3  >  V1  ) 

Cl  —  ||  2  > 

V1  2 

C2  ~  1  2 
v2  2 

Therefore,  the  next  orthogonal  basis  vector  is  given  by  the  formula 


v 


2- 


Since  v:  and  v2  are  linear  combinations  of  wy  and  w2,  we  must  have  v3  ^  0,  since 
otherwise,  this  would  imply  that  w1,w2,w3  are  linearly  dependent,  and  hence  could  not 
come  from  a  basis. 

Continuing  in  the  same  manner,  suppose  we  have  already  constructed  the  mutually  or¬ 
thogonal  vectors  v1? . . . ,  as  linear  combinations  of  w1? . . . ,  wfc_1.  The  next  orthogonal 
basis  element  wk  will  be  obtained  from  wfc  by  subtracting  off  a  suitable  linear  combination 
of  the  previous  orthogonal  basis  elements: 


=  wfc  “civi  -  '* 


'  Ck-1  Vk-1- 
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Since  v1? . . . ,  vfc-1  are  already  orthogonal,  the  orthogonality  constraint 

o  =  <  Vfc  ,  vi  >  =  (wfc  >  vi  >  -  cj  (vi  >  VJ  ) 

requires 


cj  = 


(  Wfc  >  v  ■ 


for 


vi 


j  =  1, . . .  ,k  —  1. 


In  this  fashion,  we  establish  the  general  Gram-Schmidt  formula 

_  g  (  Wfe  ,  Vj 

3  =  1 


Vfc  =  Wfc 


V. 


VJ 


k  =  1, 


,  n. 


(4.18) 


(4.19) 


The  iterative  Gram-Schmidt  process  (4.19),  where  we  start  with  v:  =  wq  and  successively 
construct  v2, . . . ,  vn,  defines  an  explicit,  recursive  procedure  for  constructing  the  desired 
orthogonal  basis  vectors.  If  we  are  actually  after  an  orthonormal  basis  u1?...,u^,  we 
merely  normalize  the  resulting  orthogonal  basis  vectors,  setting  ufc  =  vfc/||vfc 
k  —  1, . . . ,  n. 


n  ’ 


for  each 


Example  4.13.  The  vectors 


w 


l 


(4.20) 


are  readily  seen  to  form  a  basis^  of  M3.  To  construct  an  orthogonal  basis  (with  respect  to 
the  standard  dot  product)  using  the  Gram-Schmidt  process,  we  begin  by  setting 


The  next  basis  vector  is 


The  last  orthogonal  basis  vector  is 


(  2  \  . 

f  A  _ 

f  i\ 

Wo  •  V-,  Wo  •  v9 

o 

-3 

1 

7 

1 

3 

v  3  —  w3  2  V 1  2  V2  — 

Vi  Vo 

—  Z 

3 

1 

14 

3 

2 

±  z 

\  3) 

(-1) 

3 

vf  / 

y-hJ 

The  reader  can  easily  validate  the  orthogonality  of  v1?  v 

2-V3- 

An  orthonormal  basis  is  obtained  by  dividing  each  vector  by  its  length.  Since 


^  This  will,  in  fact,  be  a  consequence  of  the  successful  completion  of  the  Gram-Schmidt  process 
and  does  not  need  to  be  checked  in  advance.  If  the  given  vectors  were  not  linearly  independent,  then 
eventually  one  of  the  Gram-Schmidt  vectors  would  vanish,  vfc  =  0,  and  the  iterative  algorithm 
would  break  down. 
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( 

(  4  \ 

f  x/42 

(  2  \ 

f  VT4 

Ui  = 

1 

73 

,  u2  = 

1 

\/42 

.  u3  = 

3 

vT4 

\  75/ 

5 

\  V42  / 

1 

V  vT3  / 

(4.21) 


Example  4.14.  Here  is  a  typical  problem:  find  an  orthonormal  basis,  with  respect  to 
the  dot  product,  for  the  subspace  W  C  M4  consisting  of  all  vectors  that  are  orthogonal  to 
the  given  vector  a  =  ( 1,2,  —  1,  —3)  .  The  first  task  is  to  find  a  basis  for  the  subspace. 
Now,  a  vector  x  =  ( x1?  x2,  x3,  x4  )T  is  orthogonal  to  a  if  and  only  if 

x  •  a  =  x1  +  2x2  —  x3  —  3x4  =  0. 

Solving  this  homogeneous  linear  system  by  the  usual  method,  we  observe  that  the  free 
variables  are  x2,x3,x4,  and  so  a  (non-orthogonal)  basis  for  the  subspace  is 

/3\ 

0 


/-2\ 

(X\ 

1 

0 

Wj  = 

0 

,  W2  = 

1 

\  0/ 

Vo/ 

w3  = 


0 

Vi ) 


V1  =  wi 


To  obtain  an  orthogonal  basis,  we  apply  the  Gram-Schmidt  process.  First, 

/-2\ 
i 

0 

\  0/ 

The  next  element  is 


v2  =  W2 


(1\  . 

r2\ 

w2  •  Vl  

0 

—2 

1 

2 

V  2  v  - 

VI 

1 

\0/ 

5 

0 

0/ 

5 

l 

0/ 

The  last  element  of  our  orthogonal  basis  is 


V3  =  W3 


(~2\ 

f  A 

w3  •  V,  __  w3  •  v2  __ 

0 

—6 

1 

3 

5 

2 

5 

l 

v  2  V1  v  2  V2  - 
V1  v2 

0 

5 

0 

6 

5 

1 

1 

2 

V 

An  orthonormal  basis  can  then  be  obtained  by  dividing  each  v?;  by  its  length: 

/-4A 


V  1/ 


Ui  = 


75 

1 

75 

0 

0 


u2  = 


/ 


730  ^ 

( 

1  \ 
VTo  1 

2 

2 

Gso 

5 

>  U3  = 

vdo 

l 

V30 

VTo 

Vo/ 

2 

TTo  / 

(4.22) 


Remark.  The  orthonormal  basis  produced  by  the  Gram-Schmidt  process  depends  on  the 
order  of  the  vectors  in  the  original  basis.  Different  orderings  produce  different  orthonormal 
bases. 
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The  Gram-Schmidt  process  has  one  final  important  consequence.  According  to  The¬ 
orem  2.29,  every  finite-dimensional  vector  space  —  except  {0}  —  admits  a  basis.  Given 
an  inner  product,  the  Gram-Schmidt  process  enables  one  to  construct  an  orthogonal  and 
even  orthonormal  basis  of  the  space.  Therefore,  we  have,  in  fact,  implemented  a  con¬ 
structive  proof  of  the  existence  of  orthogonal  and  orthonormal  bases  of  an  arbitrary  finite¬ 
dimensional  inner  product  space. 

Theorem  4.15.  Every  non-zero  finite-dimensional  inner  product  space  has  an  ortho¬ 
normal  basis. 

In  fact,  if  its  dimension  is  >  i,  then  the  inner  product  space  has  infinitely  many 
orthonormal  bases. 


Exercises 


i  n 


Note :  For  Exercises  #1-7  use  the  Euclidean  dot  product  on 

Q 

4.2.1.  Use  the  Gram-Schmidt  process  to  determine  an  orthonormal  basis  for  R  starting  with 


the  following  sets  of  vectors: 

/1\  / 1 \  /-1\ 

(a)  0  , 

w 


2 

1 ) 


(b) 


0 
1 


1\ 
0 


V-i /  V-o 


(c) 


m 
2 


4\ 
5 


2  \ 
3 


\3  /  VO  J  \-lJ 


4.2.2.  Use  the  Gram-Schmidt  process  to  construct  an  orthonormal  basis  for 


the  following  sets  of  vectors:  (a)  ( 1,  0, 1,  0  )T  ,  ( 0, 1,  0 

- 1  )T  1  ( i>  0, 0, 1  )T ,  ( 1, 

(0  ( 1, 0, 0, 1  y ,  ( 4, 1, 0, 0  y ,  ( 1, 0, 2, 1  y ,  ( 0, 2, 0, 1  )T . 

/  1\ 

o\ 

(  2\ 

4.2.3.  Try  the  Gram-Schmidt  procedure  on  the  vectors 

-1 

0 

-1 

1 

1 

-1 

-1 

l  1 J 

2  / 

^  0 ) 

starting  with 

-i  \T 


(  2\ 
2 

-2 

V  1/ 


What  happens?  Can  you  explain  why  you  are  unable  to  complete  the  algorithm? 

4.2.4.  Use  the  Gram-Schmidt  process  to  construct  an  orthonormal  basis  for  the  following 

Q  rJ^]  rJn 

subspaces  of  R  :  (a)  the  plane  spanned  by  ( 0,  2, 1 )  ,  ( 1,  —2,  —1 )  ;  (b)  the  plane  defined 

by  the  equation  2x  —  y  +  3z  =  0;  (c)  the  set  of  all  vectors  orthogonal  to  (1,-1,  —2  )  . 

4.2.5.  Find  an  orthogonal  basis  of  the  subspace  spanned  by  the  vectors  w1  =  (1,— 1,-1, 1,1)  , 
W2  =  ( 2, 1,  4,  -4,  2  )T ,  and  w3  =  ( 5,  -4,  -3,  7, 1  )T. 

4.2.6.  Find  an  orthonormal  basis  for  the  following  subspaces  of  R4:  (a)  the  span  of  the  vectors 

/  1\ 


1 

-1 
V  0 ) 


(~l\ 

/  2  \ 

0 

-1 

1 

5 

2 

l  l) 

1  1/ 

;  (b)  the  kernel  of  the  matrix  (  ^  2  1  1 


(c)  the  coimage 


of  the  preceding  matrix;  (d)  the  image  of  the  matrix 


/  1 
2 
0 

V-2 


2  2  \ 

4  1 

0  -1 

4  5  J 


(e)  the  cokernel 


T 

of  the  preceding  matrix;  (f)  the  set  of  all  vectors  orthogonal  to  ( 1, 1,  — 1,  — 1 )  . 

4.2.7.  Find  orthonormal  bases  for  the  four  fundamental  subspaces  associated  with  the  following 
matrices:  /  ^  2  1  \ 


(a) 


1 

3 


1 

3 


(*>) 


(c) 


/  1  0  1  0\ 
1111 
V-i  201/ 


(d) 


0  -2  1 
-1  0  -2 
V  1  -2  3  J 
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4.2.8.  Construct  an  orthonormal  basis  of 

_T  I  3  0 

0  5 


for  the  nonstandard  inner  products 


r 

(a)  <  x ,  y  >  =  x' 


y.  0)  (x,y)  =  x 


T 


y,  (c)  (x,y)  =  x 


T 


O 

4.2.9.  Construct  an  orthonormal  basis  for  R  with  respect  to  the  inner  products  defined  by  the 


(  4 

-2 

0\ 

(  3 

-1  1\ 

following  positive  definite  matrices:  (a)  —2 

3 

-1 

,  (b)  I 

4  -2 

V  o 

-1 

2  J 

V  1 

-2  4/ 

4.2.10.  Redo  Exercise  4.2.1  using 

(y)  the  weighted  inner  product  ( v  ,  w )  =  3  v1  w1  +  2v2  w2  +  re3; 
( ii )  the  inner  product  induced  by  the  positive  definite  matrix  K  = 


(  2 

-1 

0 


-1  0\ 

2  -1 
-1  2 


0  4.2.11.  (a)  How  many  orthonormal  bases  does  R  have?  (b)  What  about  R2?  (c)  Does  your 
answer  change  if  you  use  a  different  inner  product?  Justify  your  answers. 


4.2.12.  True  or  false:  Reordering  the  original  basis  before  starting  the  Gram-Schmidt  process 
leads  to  the  same  orthogonal  basis. 

0  4.2.13.  Suppose  that  W  C  Rn  is  a  proper  subspace,  and  u1? . . . ,  um  forms  an  orthonormal 

basis  of  W.  Prove  that  there  exist  vectors  um+1, . . . ,  un  £  Rn  \  W  such  that  the  complete 
collection  u1? . . . ,  un  forms  an  orthonormal  basis  for  Rn.  Hint :  Begin  with  Exercise  2.4.20. 

0  4.2.14.  Verify  that  the  Gram-Schmidt  formula  (4.19)  also  produce  an  orthogonal  basis  of  a 
complex  vector  space  under  a  Hermit ian  inner  product. 

4.2.15.  (a)  Apply  the  complex  Gram-Schmidt  algorithm  from  Exercise  4.2.14  to  produce  an 

orthonormal  basis  starting  with  the  vectors  (1+  i ,  1  —  i  )T  ,  ( 1  —  2 i , 5 i  )T  £  C2. 

(b)  Do  the  same  for  (1+  i ,  1  —  i ,  2  —  i  )T  ,  ( 1  +  2  i ,  —2  i ,  2  —  i  )T  ,  ( 1, 1  —  2  i ,  i  )T  £  C3. 

4.2.16.  Use  the  complex  Gram-Schmidt  algorithm  from  Exercise  4.2.14  to  construct 
orthonormal  bases  for  (a)  the  subspace  spanned  by  (1—  i,l,0)T,(0,3  —  i ,  2 i  )T; 

(b)  the  set  of  solutions  to  (2—  \  )x  —  2\y  (1  —  2i)z  =  0; 

(c)  the  subspace  spanned  by  (  —  i ,  1,  —1,  i  )T  ,  ( 0,  2  i ,  1  —  i ,  —1  +  i  )T  ,  ( 1,  i ,  —  i ,  1  —  2  i  )T  . 


Modifications  of  the  Gram-Schmidt  Process 


With  the  basic  Gram-Schmidt  algorithm  now  in  hand,  it  is  worth  looking  at  a  couple  of 
reformulations  that  have  both  practical  and  theoretical  advantages.  The  first  can  be  used 
to  construct  the  orthonormal  basis  vectors  u1? . . . ,  un  directly  from  the  basis  w1? . . . ,  wn. 
We  begin  by  replacing  each  orthogonal  basis  vector  in  the  basic  Gram-Schmidt  for¬ 


mula  (4.19)  by  its  normalized  version  u  -  =  v7 


j 


j 


The  original  basis  vectors  can  be 


expressed  in  terms  of  the  orthonormal  basis  via  a  “triangular”  system 

wi 

w2  =  r12Ul  +r22U2> 

W3  =  r13Ul+r23U2+r33U3> 


(4.23) 


Wn=rlnUl+r2nU2+  +GmUn’ 

The  coefficients  ri  -  can,  in  fact,  be  computed  directly  from  these  formulas.  Indeed,  taking 
the  inner  product  of  the  equation  for  w;  with  the  orthonormal  basis  vector  u7  for  i  <  j, 
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we  obtain,  in  view  of  the  orthonormality  constraints  (4.6), 


(wi,ui>  =  (ryu1+  •••  +rjjuj,ui)  ^rlj(u1,ui)  + 
and  hence 

rij  =  ( , u, 

On  the  other  hand,  according  to  (4.5), 


+  rjj{un,ui)  -  r?w, 


wj 


rU  ui  + 


+  r.jUjf=rlj  + 


+  r2  4-  r2 
3~  1,3  33 


(4.24) 


(4.25) 


The  pair  of  equations  (4.24-25)  can  be  rearranged  to  devise  a  recursive  procedure  to  com¬ 
pute  the  orthonormal  basis.  We  begin  by  setting  rn  =  ||  w1  ||  and  so  iq  =  w1/r11.  At 
each  subsequent  stage  j  >  2,  we  assume  that  we  have  already  constructed  iq, . . . ,  u  •_ 
We  then  compute 


rij  =  (wi  for  each  i  =  -1. 


3  1  i 


(4.26) 


We  obtain  the  next  orthonormal  basis  vector  u  •  by  computing 


r  ■  ■  — 

33 


w 


3 


2  _  r2 
'  71 3 


r2 

3-1,3  ’ 


ui  = 


wi  -  rij  ui 


T 3-1,3  Uj-1 


r  ■  ■ 

33 


(4.27) 


Running  through  the  formulas  (4.26-27)  for  j  =  1, . . .  ,n  leads  to  the  same  orthonormal 
basis  iq, . . . ,  un  produced  by  the  previous  version  of  the  Gram-Schmidt  procedure. 

Example  4.16.  Let  us  apply  the  revised  algorithm  to  the  vectors 


w. 


1 

1 

1 


w2  = 


1 

0 

2 


Wo 


2 

-2 

3 


of  Example  4.13.  To  begin,  we  set 


rn  = 


w 


=  \/3, 


w 


U1  = 


r 


li 


The  next  step  is  to  compute 
ri2  =  ( w2  ,  Ui )  =  -  A  ,  r22  = 


x/T 

i 

V-  73 / 


43 


w. 


2  _  r2 
'  '  12  — 


14 


>  u2  = 


w2  -  r12Ul 


/^\ 

vT2 


r 


The  final  step  yields 

r13  =  (  W3  >  U1  )  =  -  43  , 


^23  =  iW3>U2/  = 


22 


21 


vT2 
5 

V  V42  / 


2  ’ 


r33 


W 


2  -  r?3  -  r23  =\l\ 


U3  = 


W3  “r13Ul  “r23U2 


r 


33 


/ 

Vl4 

_ 3_ 

\/l4 

_ L_ 

\  Vdi  / 


As  advertised,  the  result  is  the  same  orthonormal  basis  vectors  that  we  previously  found 
in  Example  4.13. 

For  hand  computations,  the  original  version  (4.19)  of  the  Gram-Schmidt  process  is 
slightly  easier  —  even  if  one  does  ultimately  want  an  orthonormal  basis  —  since  it  avoids 
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the  square  roots  that  are  ubiquitous  in  the  orthonormal  version  (4.26-27).  On  the  other 
hand,  for  numerical  implementation  on  a  computer,  the  orthonormal  version  is  a  bit  faster, 
since  it  involves  fewer  arithmetic  operations. 

However,  in  practical,  large-scale  computations,  both  versions  of  the  Gram-Schmidt 
process  suffer  from  a  serious  flaw.  They  are  subject  to  numerical  instabilities,  and  so  accu¬ 
mulating  round-off  errors  may  seriously  corrupt  the  computations,  leading  to  inaccurate, 
non-orthogonal  vectors.  Fortunately,  there  is  a  simple  rearrangement  of  the  calculation 
that  ameliorates  this  difficulty  and  leads  to  the  numerically  robust  algorithm  that  is  most 
often  used  in  practice,  [21,  40,  66].  The  idea  is  to  treat  the  vectors  simultaneously  rather 
than  sequentially,  making  full  use  of  the  orthonormal  basis  vectors  as  they  arise.  More 


specifically,  the  algorithm  begins  as  before  —  we  take  rq  =  w1 


w 


We  then  subtract 


off  the  appropriate  multiples  of  iq  from  all  of  the  remaining  basis  vectors  so  as  to  arrange 
their  orthogonality  to  iq.  This  is  accomplished  by  setting 


w 


(2) 


k 


=  W 


k 


Wfc,Ui 


U 


1 


for  k  =  2, . . . ,  n. 

(2)  (2) 

=  w)>  /\\  w2  ;  ||  is  then  obtained  by  normalizing. 
We  next  modify  the  remaining  , . . . ,  wi2)  to  produce  vectors 


The  second  orthonormal  basis  vector  u2  =  w2 


w 


(3)  _  w(2) 


k 


=  W 


k 


—  (  W 


(2) 

k 


Uo  U 


2  ^ 


k  =  3, . . . ,  n, 


(3) 

that  are  orthogonal  to  both  rq  and  u2.  Then  u3  =  w3 


w 


(3) 


is  the  next  orthonormal 


vectors  w  ■  =  wj1^1 


basis  element,  and  the  process  continues.  The  full  algorithm  starts  with  the  initial  basis 

j  =  1, . . . ,  n,  and  then  recursively  computes 

j  =  l,-..,n, 

*  =  3  +  !,-••,  n. 


uj  = 


U) 


w 


CO 


w 


G+i)  _ 


=  wij)  -  (wP,u,)u,, 


k 


3 


3 


(4.28) 


j 


(In  the  hnal  phase,  when  j  =  n,  the  second  formula  is  no  longer  needed.)  The  result  is  a 
numerically  stable  computation  of  the  same  orthonormal  basis  vectors  iq, . . . ,  un. 

Example  4.17.  Let  us  apply  the  stable  Gram-Schmidt  process  to  the  basis  vectors 


(i) 

Wq  =  Wx  = 


■(1)  _  w  _ 


w2  =  w2  = 


The  hrst  orthonormal  basis  vector  is  u:  = 


w 


(i) 


w 


(1) 


(1) 

w3  '  =  w3  = 


4  .  Next,  we  compute 


\-U 


w^2)  =  W^1}  -  ( wy  ,  ux )  Uj  =  |  2  I , 


.(i) 


The  second  orthonormal  basis  vector  is  u2  = 


■(2)  _  (1) 


W3  =  W3  -  \  W3  ? 


XL 


1 


W 


(2) 


W 


(2) 


f--L\ 

V2 

1 

v/2 


f  -h\ 


.(3) 


Wo  =  Wo 


(2) 


(2)  \ 

W3  G  U2  )  U2  = 


0  / 

(3) 


W 


u3  = 


V  -2/ 


w 


(3) 


ui )  Ui  =  (  0  1. 

Finally, 


/  _  V2\ 

6 

_G2 
6 


The  resulting  vectors  rq,  u2,  u3  form  the  desired  orthonormal  basis. 


200 


4  Orthogonality 


Exercises 

4.2.17.  Use  the  modified  Gram-Schmidt  process  (4.26-27)  to  produce  orthonormal  bases  for  the 


spaces  spanned  by  the  following  vectors:  (a) 


/  — 1  \ 
1 

2/ 


/  — 1  \ 
-1 
1/ 


0\ 

1 

\  3  / 


(0 


/0\ 

1 

VI/ 


/ 1  \ 
o 

W 


/  2  \ 
1 

\0/ 


(c) 


/  1\ 

(- 1\ 

(  2\ 

1 

0 

-1 

-1 

1 

1 

1 

2 

0 ) 

l) 

l  l) 

(d) 


/2\ 

/  o\ 

(  1\ 

/  1  \ 

/  1\ 

/  1\ 

1 

-1 

2 

1 

0 

-1 

0 

3 

5 

2 

5 

-1 

,  (e) 

0 

5 

1 

5 

0 

5 

-1 

1 

-1 

0 

1 

1 

1 

0 

Vo?1 

V  17 

V  i) 

Vo?1 

Vo?1 

V  -i  / 

V  iJ 

4.2.18.  Repeat  Exercise  4.2.17  using  the  numerically  stable  algorithm  (4.28)  and  check  that  you 
get  the  same  result.  Which  of  the  two  algorithms  was  easier  for  you  to  implement? 

4.2.19.  Redo  each  of  the  exercises  in  the  preceding  subsection  by  implementing  the  numerically 
stable  Gram-Schmidt  process  (4.28)  instead,  and  verify  that  you  end  up  with  the  same 
orthonormal  basis. 

0  4.2.20.  Prove  that  (4.28)  does  indeed  produce  an  orthonormal  basis.  Explain  why  the  result  is 
the  same  orthonormal  basis  as  the  ordinary  Gram-Schmidt  method. 

4.2.21.  Let  w ^  be  the  vectors  in  the  stable  Gram-Schmidt  algorithm  (4.28).  Prove  that  the 


coefficients  in  (4.23)  are  given  by  r-  = 


(0 


and  ri3  =  ( ,  u  • )  for  i  <  j. 


4.3  Orthogonal  Matrices 

Matrices  whose  columns  form  an  orthonormal  basis  of  Mn  relative  to  the  standard  Euclidean 
dot  product  play  a  distinguished  role.  Such  “orthogonal  matrices”  appear  in  a  wide  range  of 
applications  in  geometry,  physics,  quantum  mechanics,  crystallography,  partial  differential 
equations,  [61],  symmetry  theory,  [60],  and  special  functions,  [59].  Rotational  motions 
of  bodies  in  three-dimensional  space  are  described  by  orthogonal  matrices,  and  hence  they 
lie  at  the  foundations  of  rigid  body  mechanics,  [31],  including  satellites,  airplanes,  drones, 
and  underwater  vehicles,  as  well  as  three-dimensional  computer  graphics  and  animation  for 
video  games  and  movies,  [5].  Furthermore,  orthogonal  matrices  are  an  essential  ingredient 
in  one  of  the  most  important  methods  of  numerical  linear  algebra:  the  QR  algorithm  for 
computing  eigenvalues  of  matrices,  to  be  presented  in  Section  9.5. 

Definition  4.18.  A  square  matrix  Q  is  called  orthogonal  if  it  satisfies 


QtQ  =  QQ1  =  I 


T 


(4.29) 


The  orthogonality  condition  implies  that  one  can  easily  invert  an  orthogonal  matrix: 

Q^1  =  Qt.  (4.30) 

In  fact,  the  two  conditions  are  equivalent,  and  hence  a  matrix  is  orthogonal  if  and  only 
if  its  inverse  is  equal  to  its  transpose.  In  particular,  the  identity  matrix  I  is  orthogonal. 
Also  note  that  if  Q  is  orthogonal,  so  is  QT .  The  second  important  characterization  of 
orthogonal  matrices  relates  them  directly  to  orthonormal  bases. 
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Proposition  4.19.  A  matrix  Q  is  orthogonal  if  and  only  if  its  columns  form  an  orthonor¬ 
mal  basis  with  respect  to  the  Euclidean  dot  product  on 


Proof :  Let  u1? . . . ,  un  be  the  columns  of  Q.  Then  irf , . . . ,  are  the  rows  of  the  trans¬ 
posed  matrix  QT .  The  (i,  j )  entry  of  the  product  QTQ  is  given  as  the  product  of  the  zth  row 
of  QT  and  the  jth  column  of  Q.  Thus,  the  orthogonality  requirement  (4.29)  implies 

1  ^  2  —  j  ^ 

which  are  precisely  the  conditions  (4.6)  for  u1? . . . ,  u 

0,  i  ±  j, 

to  form  an  orthonormal  basis.  Q.E.D. 


u  •  •  u 


3 


T 

=  CL  U 


3 


In  particular,  the  columns  of  the  identity  matrix  produce  the  standard  basis  e1? . . . ,  en 
of  Mn.  Also,  the  rows  of  an  orthogonal  matrix  Q  also  produce  an  (in  general  different) 
orthonormal  basis. 

Warning.  Technically,  we  should  be  referring  to  an  “orthonormaF  matrix,  not  an  “orthog¬ 
onal”  matrix.  But  the  terminology  is  so  standard  throughout  mathematics  and  physics  that 
we  have  no  choice  but  to  adopt  it  here.  There  is  no  commonly  accepted  name  for  a  matrix 
whose  columns  form  an  orthogonal  but  not  orthonormal  basis. 


Example  4.20.  A  2  x  2  matrix  Q  = 
a  \  /b 


a  b 
c  d 


is  orthogonal  if  and  only  if  its  columns 


Ui  = 


c 


u2  =  \  d  h  f°rm  an  orthonormal  basis  of  ML  Equivalently,  the  requirement 


qtq  = 


a  c 
b  d 


a  b 
c  d 


a2  +  c2  ab  +  cd 
ab  +  cd  b2  +  d2 


1  0 
0  1 


implies  that  its  entries  must  satisfy  the  algebraic  equations 


a2  -he2  =  1, 


ab  P  cd  =  0, 


b2  +d2  =  1. 


The  first  and  last  equations  say  that  the  points  (a,  c)T  and  (b^d)1  lie  on  the  unit  circle 
in  M2,  and  so 


T 


a  =  cos#,  c—  sin0,  b  —  cosfj,  d  —  sin^, 


for  some  choice  of  angles  0,  fj.  The  remaining  orthogonality  condition  is 


0  =  ab  cd  —  cos  9  cos  +  sin  9  sin  if  =  cos (9  —  ip), 

which  implies  that  9  and  if  differ  by  a  right  angle:  if  —  9  ±  The  ±  sign  leads  to  two 
cases: 

b  —  —  sin  9,  d  =  cos$,  or  6  =  sin#,  d  —  —  cos  9. 

As  a  result,  every  2x2  orthogonal  matrix  has  one  of  two  possible  forms 


/  cos  9  —  sin  9 
\  sin  9  cos  9 


or 


/  cos  9  sin  9  \ 
y  sin  9  —  cos  9  J  ’ 


where  0  <  9  <  2n.  (4.31) 


The  corresponding  orthonormal  bases  are  illustrated  in  Figure  4.2.  The  former  is  a  right- 
handed  basis,  as  defined  in  Exercise  2.4.7,  and  can  be  obtained  from  the  standard  basis 
e1?  e2  by  a  rotation  through  angle  #,  while  the  latter  has  the  opposite,  reflected  orientation. 
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Figure  4.2.  Orthonormal  Bases  in  Mi 2. 


Example  4.21.  A  3  x  3  orthogonal  matrix  Q  =  ( u:  u2  u3  )  is  prescribed  by  3  mutually 
perpendicular  vectors  of  unit  length  in  M3.  For  instance,  the  orthonormal  basis  constructed 


in  (4.21)  corresponds  to  the  orthogonal  matrix  Q  = 


1 

4 

2  \ 

a/14 

Vs 

a/42 

1 

1 

3 

V3 

a/42 

vTi 

1 

5 

1 

V3 

a/42 

a/14  / 

A  complete 


list  of  3  x  3  orthogonal  matrices  can  be  found  in  Exercises  4.3.4  and  4.3.5. 


Lemma  4.22.  An  orthogonal  matrix  Q  has  determinant  det  Q  =  =bl. 

Proof :  Taking  the  determinant  of  (4.29),  and  using  the  determinantal  formulas  (1.85), 
(1.89),  shows  that 

1  =  det  I  =  det  (QTQ)  —  det  QT  det  Q  —  (det  Q)2, 
which  immediately  proves  the  lemma.  Q.E.D. 

An  orthogonal  matrix  is  called  proper  or  special  if  it  has  determinant  + 1.  Geometrically, 
the  columns  of  a  proper  orthogonal  matrix  form  a  right-handed  basis  of  Mn,  as  defined  in 
Exercise  2.4.7.  An  improper  orthogonal  matrix,  with  determinant  —1,  corresponds  to  a 
left  handed  basis  that  lives  in  a  mirror-image  world. 

Proposition  4.23.  The  product  of  two  orthogonal  matrices  is  also  orthogonal. 

Proof :  If 

qTQi  =  i  =  q!q2,  then  {Q1Q2V  (Q1Q2)  =  Q2Q1Q1Q2  =  Q2Q2  =  c 

and  so  the  product  matrix  Q1Q2  is  also  orthogonal.  Q.E.D. 

This  multiplicative  property  combined  with  the  fact  that  the  inverse  of  an  orthogonal 
matrix  is  also  orthogonal  says  that  the  set  of  all  orthogonal  matrices  forms  a  group^ .  The 


i  The  precise  mathematical  definition  of  a  group  can  be  found  in  Exercise  4.3.24.  Although 
they  will  not  play  a  significant  role  in  this  text,  groups  underlie  the  mathematical  formalization  of 

symmetry  and,  as  such,  form  one  of  the  most  fundamental  concepts  in  advanced  mathematics  and 

its  applications,  particularly  quantum  mechanics  and  modern  theoretical  physics,  [54],  Indeed, 
according  to  the  mathematician  Felix  Klein,  cf.  [92],  all  geometry  is  based  on  group  theory. 
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orthogonal  group  lies  at  the  foundation  of  everyday  Euclidean  geometry,  as  well  as  rigid 
body  mechanics,  atomic  structure  and  chemistry,  computer  graphics  and  animation,  and 
many  other  areas. 


Exercises 


4.3.1.  Determine  which  of  the  following  matrices  are  (z)  orthogonal;  (zz)  proper  orthogonal. 


(a) 


(b) 


12 

5  \ 

( 

0 

1 

0\ 

/ 

1 

3 

2 

3 

2  \ 
3 

13 

5 

13 

12  1  ’ 

(C) 

-1 

0 

0 

»  (d) 

2 

3 

1 

3 

2 

3 

13 

13  / 

K 

0 

0 

-1/ 

V 

2 

3 

2 

3 

"3/ 

/ 1 

2 

1 

3 

I  \ 

4 

/ 

3 

5 

0 

4  \ 

5 

( 

2 

3 

+2 

6 

+2  \ 
2 

0) 

1 

3 

1 

4 

1 

5 

.  (f) 

4 

13 

12 

13 

3 

13 

.  (s) 

2 

3 

V2 

6 

2 

u 

1 

5 

h) 

V 

48 

65 

5 

13 

36 
65  / 

V 

1 

3 

2\/2 

3 

0  ) 

f  1 

0 

0\ 

( 

cos  6 

sin  0 

0\ 

4.3.2.  (a)  Show  that  R  = 

0 

0 

1 

,  a  reflection  matrix,  and  Q  = 

—  sin  0 

cos  0 

0 

1 

K 

0 

0 

1/ 

representing  a  rotation  by  the  angle  0  around  the  z-axis,  are  both  orthogonal,  (b)  Verify 
that  the  products  RQ  and  QR  are  also  orthogonal,  (c)  Which  of  the  preceding  matrices, 
R,Q,  RQ,Q  R,  are  proper  orthogonal? 


4.3.3.  True  or  false:  (a)  If  Q  is  an  improper  2x2  orthogonal  matrix,  then  Q 2  =  I. 
(b)  If  Q  is  an  improper  3x3  orthogonal  matrix,  then  Q 2  =  I. 


T  4.3.4.  (a)  Prove  that,  for  all  0 ,  p,  z/g 


Q 


/  cos  p  cos  if  —  cos  9  sin  p  sin  fj  sin  p  cos  +  cos  0  cos  p  sin  fj 
—  cos  p  sin  if  —  cos  0  sin  p  cos  if  —  sin  p  sin  if  +  cos  0  cos  p  cos  if 
\  sin  0  sin  p  —  sin  0  cos  p 


sin  0  sin  if  \ 
sin#  cos  if 
cos  0  J 


is  a  proper  orthogonal  matrix,  (b)  Write  down  a  formula  for  Q  1. 

Remark.  It  can  be  shown  that  every  proper  orthogonal  matrix  can  be  parameterized 
in  this  manner;  0 ,  p,if  are  known  as  the  Euler  angles ,  and  play  an  important  role  in 
applications  in  mechanics  and  geometry,  [31;  p.  147]. 


T  4.3.5.  (a)  Show  that  if  +  z/3  +  y\  =  1,  then  the  matrix 


Q 


(y\  +  y\-y\-y\  2  (y2y3  +  y1y4) 

2(//2% -j/1%)  vi-vl  +  vl-vi 

2(2/22/4  +  2/1%)  2(y3i/4  -yxy2) 


2(2/22/4-2/12/3)  N 

2(2/32/4+2/1 2/2) 

2  2  2,2 
Vl  -  V2  -  V3  +  V4 


is  a  proper  orthogonal  matrix.  The  numbers  2/i  5  2/2  ’  2^3  ’  2^4  are  known  as  Cayley-Klein 
parameters,  (b)  Write  down  a  formula  for  Q~1 .  (c)  Prove  the  formulas 


Vl  = 


cos 


p  +  V’ 


0 

cos  - 
2 


2/2  = 


cos 


p-fj 


.  0 
sm  -  , 
2 


V3  = 


sm 


p-fj 


.  0 
sm  - 
2 


Va  = 


sm 


p  +  i> 


e 

cos  - 
2 


relating  the  Cayley-Klein  parameters  and  the  Euler  angles  of  Exercise  4.3.4,  cf.  [31;  §§4-5]. 

0  4.3.6.  (a)  Prove  that  the  transpose  of  an  orthogonal  matrix  is  also  orthogonal,  (b)  Explain 
why  the  rows  of  an  n  x  n  orthogonal  matrix  also  form  an  orthonormal  basis  of  IRn. 

4.3.7.  Prove  that  the  inverse  of  an  orthogonal  matrix  is  orthogonal. 

4.3.8.  Show  that  if  Q  is  a  proper  orthogonal  matrix,  and  R  is  obtained  from  Q  by 
interchanging  two  rows,  then  R  is  an  improper  orthogonal  matrix. 
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4.3.9.  Show  that  the  product  of  two  proper  orthogonal  matrices  is  also  proper  orthogonal. 

What  can  you  say  about  the  product  of  two  improper  orthogonal  matrices?  What  about  an 
improper  times  a  proper  orthogonal  matrix? 

4.3.10.  True  or  false:  (a)  A  matrix  whose  columns  form  an  orthogonal  basis  of  IRn  is  an 

orthogonal  matrix,  (b)  A  matrix  whose  rows  form  an  orthonormal  basis  of  IRn  is  an 
orthogonal  matrix,  (c)  An  orthogonal  matrix  is  symmetric  if  and  only  if  it  is  a  diagonal 
matrix. 


4.3.11.  Write  down  all  diagonal  n  x  n  orthogonal  matrices. 

0  4.3.12.  Prove  that  an  upper  triangular  matrix  U  is  orthogonal  if  and  only  if  U  is  a  diagonal 
matrix.  What  are  its  diagonal  entries? 

4.3.13.  (a)  Show  that  the  elementary  row  operation  matrix  corresponding  to  the  interchange  of 
two  rows  is  an  improper  orthogonal  matrix,  (b)  Are  there  any  other  orthogonal  elementary 
matrices? 


4.3.14.  True  or  false:  Applying  an  elementary  row  operation  to  an  orthogonal  matrix  produces 
an  orthogonal  matrix. 

4.3.15.  (a)  Prove  that  every  permutation  matrix  is  orthogonal,  (b)  How  many  permutation 
matrices  of  a  given  size  are  proper  orthogonal? 


0  4.3.16.  (a)  Prove  that  if  Q  is  an  orthogonal  matrix,  then  ||  Qx 


x 


for  every  vector  x  £ 


n 


where  | 
for  all  x  £ 


denotes  the  standard  Euclidean  norm,  (b)  Prove  the  converse:  if  ||Qx 
IRn,  then  Q  is  an  orthogonal  matrix. 


x 


0  4.3.17.  Show  that  if  AT  =  —  A  is  any  skew-symmetric  matrix,  then  its  Cayley  Transform 
Q  =  (I  —  A)-1  (I  +  A)  is  an  orthogonal  matrix.  Can  you  prove  that  I  —  A  is  always 
invertible? 


4.3.18.  Suppose  S  is  an  n  x  n  matrix  whose  columns  form  an  orthogonal,  but  not  orthonormal, 
basis  of  Mn.  (a)  Find  a  formula  for  S'-1  mimicking  the  formula  Q1  =  QT  for  an 
orthogonal  matrix,  (b)  Use  your  formula  to  determine  the  inverse  of  the  wavelet  matrix 
W  whose  columns  form  the  orthogonal  wavelet  basis  (4.9)  of  IR4. 

0  4.3.19.  Let  v1? . . . ,  vn  and  w1? . . . ,  wn  be  two  sets  of  linearly  independent  vectors  in  Mn.  Show 
that  all  their  dot  products  are  the  same,  so  •  v  ■  =  wi  •  w  •  for  all  i,j  =  1, . . . ,  n,  if  and 
only  if  there  is  an  orthogonal  matrix  Q  such  that  w2  =  Qv-  for  all  i  =  1, . . . ,  n. 

4.3.20.  Suppose  u1? . . . ,  uk  form  an  orthonormal  set  of  vectors  in  Mn  with  k  <  n.  Let 

Q  =  ( u1  u2  . . .  uk  )  denote  the  n  x  k  matrix  whose  columns  are  the  orthonormal  vectors. 

(a)  Prove  that  QTQ  =  lk.  (b)  Is  QQT  =  In? 

0  4.3.21.  Let  u1? . . . ,  un  and  u1? . . . ,  un  be  orthonormal  bases  of  an  inner  product  space  V. 

n 

Prove  that  ^  Tjuj  f°r  *  =  1?  •  •  •  ? where  Q  =  (qt] )  is  an  orthogonal  matrix. 

3  = 1 

4.3.22.  Let  A  be  an  m  x  n  matrix  whose  columns  are  nonzero,  mutually  orthogonal  vectors  in 
IRm.  (a)  Explain  why  m  >  n.  (b)  Prove  that  4  4  is  a  diagonal  matrix.  What  are  the 
diagonal  entries?  (c)  Is  44T  diagonal? 

0  4.3.23.  Let  K  >  0  be  a  positive  definite  n  x  n  matrix.  Prove  that  an  n  x  n  matrix  S  satisfies 

ST K  S  =  I  if  and  only  if  the  columns  of  S  form  an  orthonormal  basis  of  IRn  with  respect  to 

r-p 

the  inner  product  ( v  ,  w  )  =  v  if  w. 

T)  4.3.24.  Groups:  A  set  of  n  x  n  matrices  G  C  MnXn  is  said  to  form  a  group  if 

(1)  whenever  A,  B  £  G,  so  is  the  product  45  £  G,  and 

(2)  whenever  A  £  G,  then  A  is  nonsingular,  and  A-1  £  G. 
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(a)  Show  that  I  E  G.  (b)  Prove  that  the  following  sets  of  n  x  n  matrices  form  a  group: 

(i)  all  nonsingular  matrices;  (ii)  all  nonsingular  upper  triangular  matrices;  (in)  all 
matrices  of  determinant  1;  (iv)  all  orthogonal  matrices;  (a)  all  proper  orthogonal  matrices; 
(vi)  all  permutation  matrices;  (vii)  all  2  x  2  matrices  with  integer  entries  and  determinant 
equal  to  1.  (c)  Explain  why  the  set  of  all  nonsingular  2x2  matrices  with  integer  entries 
does  not  form  a  group,  (d)  Does  the  set  of  positive  definite  matrices  form  a  group? 


G  4.3.25.  Unitary  matrices :  A  complex,  square  matrix  U  is  called  unitary  if  it  satisfies  U  =  I, 

where  =  UT  denotes  the  Hermitian  adjoint  in  which  one  first  transposes  and  then 
takes  complex  conjugates  of  all  entries,  (a)  Show  that  U  is  a  unitary  matrix  if  and  only  if 
£/_1  =  .  (b)  Show  that  the  following  matrices  are  unitary  and  compute  their  inverses: 


(0 

J_ 

v/2 

i 

j_\ 

1 

,  (ii) 

V  V2 

V2  J 

/ 


V3 

1 

Vs 

i 

V  Vs 


Vs 


Vs 


\ 


i 


2  VS 

1 

2  VS 


+  -o  — 


1 


2y/S 

1 

2x73 


+  h) 


(Hi) 


(c)  Are  the  following  matrices  unitary? 

2  1  +  2i 

1  —  2  i  3 


(0 


—  1  +  2  i  —4  —  2  i 
2  —  4  i  -2  -  i 


1 

2 
1 
2 


(in) 


1 

2 

J_ 

2 

1 

2 


1 

2 

1 

2 

1 

2 

1 

2 


P 

J_ 

2 
1 
2 

V 


12 

5 

13 

13 

5 

12 

13 

13 

(d)  Show  that  U  is  a  unitary  matrix  if  and  only  if  its  columns  form  an  orthonormal  basis  of 
Cn  with  respect  to  the  Hermitian  dot  product,  (e)  Prove  that  the  set  of  unitary  matrices 
forms  a  group,  as  defined  in  Exercise  4.3.24. 


The  QR  Factorization 


The  Gram-Schmidt  procedure  for  orthonormalizing  bases  of  Mn  can  be  reinterpreted  as 
a  matrix  factorization.  This  is  more  subtle  than  the  LU  factorization  that  resulted  from 
Gaussian  Elimination,  but  is  of  comparable  significance,  and  is  used  in  a  broad  range  of 
applications  in  mathematics,  statistics,  physics,  engineering,  and  numerical  analysis. 

Let  w1? . . . ,  wn  be  a  basis  of  Mn,  and  let  u1? . . . ,  un  be  the  corresponding  orthonormal 
basis  that  results  from  any  one  of  the  three  implementations  of  the  Gram-Schmidt  process. 
We  assemble  both  sets  of  column  vectors  to  form  nonsingular  n  x  n  matrices 


A  =  ( w1 


w2 


e  =  (ui  u2 


Since  the  form  an  orthonormal  basis,  Q  is  an  orthogonal  matrix.  In  view  of  the  matrix 
multiplication  formula  (2.13),  the  Gram-Schmidt  equations  (4.23)  can  be  recast  into  an 
equivalent  matrix  form: 


A  =  QR , 


where 


\  0  0  ...  rnn/ 


(4.32) 


is  an  upper  triangular  matrix  whose  entries  are  the  coefficients  in  (4.26-27).  Since  the 
Gram-Schmidt  process  works  on  any  basis,  the  only  requirement  on  the  matrix  A  is  that 
its  columns  form  a  basis  of  Mn,  and  hence  A  can  be  any  nonsingular  matrix.  We  have 
therefore  established  the  celebrated  QR  factorization  of  nonsingular  matrices. 


Theorem  4.24.  Every  nonsingular  matrix  can  be  factored,  A  —  Q  R,  into  the  product  of 
an  orthogonal  matrix  Q  and  an  upper  triangular  matrix  R.  The  factorization  is  unique  if 
R  is  positive  upper  triangular ,  meaning  that  all  its  diagonal  entries  of  are  positive. 
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Q  R  Factorization  of  a  Matrix  A 


start 

for  j  =  1  to  n 

set  rjj  =  •••  +alj 

if  r ■  •  =  0,  stop;  print  UA  has  linearly  dependent  columns” 
else  for  i  —  1  to  n 
set  ay  =  ay/r^. 
next  i 

for  k  =  j  +  1  to  n 

Set  Tjk  =  aljalk  +  JrCLnjank 

for  i  =  l  to  n 

set  aik  =  alk- a zjrjk 

next  2 
next  k 
next  j 

end 


are  the  same  basis 


The  proof  of  uniqueness  is  relegated  to  Exercise  4.3.30. 

1  1 
1  0  - 
-1  2 

vectors  considered  in  Example  4.16.  The  orthonormal  basis  (4.21)  constructed  using  the 
Gram-Schmidt  algorithm  leads  to  the  orthogonal  and  upper  triangular  matrices 


Example  4.25.  The  columns  of  the  matrix  A  = 


( 

i 

73 

4 

\/42 

2  \ 

vTI 

(V3 

1 

73 

-V3\ 

Q  = 

1 

73 

1 

VT2 

3 

vTI 

R  = 

0 

vTi 

73 

x/21 

72 

~ 

1 

“  73 

5 

\/42 

i  J 

vTI  / 

l  0 

0 

77  1 

y/2  / 

(4.33) 


The  reader  may  wish  to  verify  that,  indeed,  A  =  Q R. 

While  any  of  the  three  implementations  of  the  Gram-Schmidt  algorithm  will  produce 
the  Q  R  factorization  of  a  given  matrix  A  —  ( wq  w2  . . .  wn  ),  the  stable  version,  as  encoded 
in  equations  (4.28),  is  the  one  to  use  in  practical  computations,  since  it  is  the  least  likely  to 
fail  due  to  numerical  artifacts  produced  by  round-off  errors.  The  accompanying  pseudocode 
program  reformulates  the  algorithm  purely  in  terms  of  the  matrix  entries  ai  -  of  A.  During 
the  course  of  the  algorithm,  the  entries  of  the  matrix  A  are  successively  overwritten;  the 
final  result  is  the  orthogonal  matrix  Q  appearing  in  place  of  A.  The  entries  rtJ  of  R  must 
be  stored  separately. 


Example  4.26.  Let  us  factor  the  matrix  A  — 


(2 

1 

0 

Vo 


1 

2 

1 

0 


0 

1 

2 

1 


°\ 

0 

1 

2/ 


using  the  numerically 


stable  QR  algorithm.  As  in  the  program,  we  work  directly  on  the  matrix  A,  gradually 
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changing  it  into  orthogonal  form, 
of  the  first  column  vector  of  A. 


In  the  first  loop,  we  set  rn  =  \/5  to  be  the  norm 
We  then  normalize  the  first  column  by  dividing  by 


rn;  the  resulting  matrix  is 


(A. 

1  75 

1 

75 

0 

V  0 


1 

2 

1 

0 


0 

1 

2 

1 


0\ 
0 
1 

2/ 


The  next  entries  r12  =  r13  —  ^=, 


r14  =  0,  are  obtained  by  taking  the  dot  products  of  the  first  column  with  the  other  three 
columns.  For  j  =  1,2,3,  we  subtract  rx  -  times  the  first  column  from  the  jth  column; 

0\ 


the  result 


1  75 

l 

75 

0 

V  0 


3 

5 


2 

5 


6 

5 

1 

0 


4 

5 

2 

1 


0 
1 
2/ 


is  a  matrix  whose  first  column  is  normalized  to  have 


unit  length,  and  whose  second,  third  and  fourth  columns  are  orthogonal  to  it.  In  the 
next  loop,  we  normalize  the  second  column  by  dividing  by  its  norm  r22  =  and  so 

°\ 


obtain  the  matrix 


/  7- 

/  75 

i 

75 

0 

V  0 


770 

6 

77o 

5 

770 

0 


2 

5 

4 

5 

2 

1 


0 
1 
2/ 


We  then  take  dot  products  of  the  second 


column  with  the  remaining  two  columns  to  produce  r23  =  ^=,  r24  =  ^=.  Subtract¬ 
ing  these  multiples  of  the  second  column  from  the  third  and  fourth  columns,  we  obtain 


/  JL 

f  75 

3 

2 

770 

7 

1 

6 

4 

3 

7s 

770 

7 

7 

0 

5 

6 

9 

770 

7 

14 

0 

0 

1 

2/ 

which  now  has  its  first  two  columns  orthonormalized,  and  or¬ 


thogonal  to  the  last  two  columns.  We  then  normalize  the  third  column  by  dividing  by 


r33  =  \/~  >  yielding 


/_2_ 

75 

1 

75 

0 


V 


0 


770 

6 

770 

5 

770 

0 


7105 

4 

7105 

6 

7105 

7 

7105 


-\ 

14 

3 
7 

_9_ 

14 

y 


on 

Finally,  we  subtract  r34  =  times 


the  third  column  from  the  fourth  column.  Dividing  the  resulting  fourth  column  by  its  norm 
r44  =  \[l  results  in  the  final  formulas, 


0  = 


/  — 

V5 

3 

770 

2 

7105 

1  \ 
730  1 

(VI 

4 

75 

i 

75 

°  ^ 

1 

6 

4 

2 

0 

Til 

16 

5 

75 

770 

7105 

730 

R  — 

75 

770 

770 

0 

5 

6 

3 

,  1 1 

0 

0 

715 

20 

770 

7105 

730 

77 

7105 

0 

0 

7 

7105 

4  ) 
730  / 

0 

0 

0 

75  1 

76  / 

for  the  A  =  QR  factorization. 
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Ill-Conditioned  Systems  and  Householder’s  Method 

The  Q  R  factorization  can  be  employed  as  an  alternative  to  Gaussian  Elimination  to  solve 
linear  systems.  Indeed,  the  system 


m 

Ax.  —  b  becomes  QRx  —  b,  and  hence  Rx  —  Q  b, 


(4.34) 


because  Q_1  =  QT  is  an  orthogonal  matrix.  Since  R  is  upper  triangular,  the  latter  system 
can  be  solved  for  x  by  Back  Substitution.  The  resulting  algorithm,  while  more  expensive 
to  compute,  offers  some  numerical  advantages  over  traditional  Gaussian  Elimination,  since 
it  is  less  prone  to  inaccuracies  resulting  from  ill-conditioning. 


Example  4.27.  Let  us  apply  the  A  =  QR  factorization 

/  -4  -4,  4v\  (y/3 

0 

\° 


compute 


Vs 


Qt  b  = 


V3 

a/42 

1 

1 

Vs 

a/42 

1 

5 

V3 

a/42 

to  solve 

the  ' 

1 

1 

1 

5 

\/TI 

3 

Vli 

-J_  , 
Via  / 


V3 

Via 

Vs 

0 


-V3\ 

V21 

V2 

VI  , 

V2  / 


4, 5  )T.  We  first 


a/42 
2 

V  Via 


a/42 

3 

vTi 


a/42 

Aa  / 


o\ 

-4 

V  57 


-  3  V3\ 


V21 

V2 

VI  . 

V2  / 


We  then  solve  the  upper  triangular  system 

/V3  1 


i?X  = 


0 
0 


Vs 

Via 

Vs 

0 


( x\ 

V21 

V2 

VI  , 

V2  / 


y 

\zj 


-3V3\ 
V21 

V2 

V  a/2  / 


by  Back  Substitution,  leading  to  the  solution  x  =  (  —2,  0, 1 ) 


T 


In  computing  the  Q  R  factorization  of  a  mildly  ill-conditioned  matrix,  one  should  employ 
the  stable  version  (4.28)  of  the  Gram-Schmidt  process.  However,  yet  more  recalcitrant 
matrices  require  a  completely  different  approach  to  the  factorization,  as  formulated  by  the 
mid-twentieth-century  American  mathematician  Alston  Householder.  His  idea  was  to  use  a 
sequence  of  certain  simple  orthogonal  matrices  to  gradually  convert  the  matrix  into  upper 
triangular  form. 

Consider  the  Householder  or  elementary  reflection  matrix 


H  —  I  -2uu 


T 


(4.35) 


in  which  u  is  a  unit  vector  (in  the  Euclidean  norm).  Geometrically,  the  matrix  H  represents 
a  reflection  of  vectors  through  the  subspace 


u 


_L 


=  {v  I  v  •  u  =  0  } 


(4.36) 


consisting  of  all  vectors  orthogonal  to  u,  as  illustrated  in  Figure  4.3.  It  is  a  symmetric 
orthogonal  matrix,  and  so 


Ht  =  H, 


H2  =  I 


H =  H. 


(4.37) 
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Figure  4.3.  Elementary  Reflection  Matrix. 


The  proof  is  straightforward:  symmetry  is  immediate,  while 

HHt  =  H 2  =  (I  —  2 u uT)  ( I  —  2 u uT)  =  I  —  4uut  +  4u  (utu)  ut  =  I . 


since,  by  assumption,  uTu  = 


u 


=  1.  Thus,  by  suitably  forming  the  unit  vector  u,  we 


can  construct  a  Householder  matrix  that  interchanges  any  two  vectors  of  the  same  length. 


Lemma  4.28.  Let  v,w  G  Mn  with  II  v  II  = 


w 


Set  u  —  (v  —  w)  / 


w 


Let  H  = 


I  —  2u uT  be  the  corresponding  elementary  reflection  matrix.  Then  Hv  =  w  and  H w  =  v. 

Proof :  Keeping  in  mind  that  v  and  w  have  the  same  Euclidean  norm,  we  compute 

(v  —  w)(v  —  w)Tv 


Hv  =  (I  -  2uut)  v  =  v 


v  —  w 


=  v  —  2 


The  proof  of  the  second  equation  is  similar. 


V 

to 

c 

2 

V 

2  —  2  v  •  w 

(v  —  w)  =  v  —  (v  —  w)  =  w. 


Q.E.D. 


In  the  first  phase  of  Householder’s  method,  we  introduce  the  elementary  reflection  matrix 
that  maps  the  first  column  v:  of  the  matrix  A  to  a  multiple  of  the  first  standard  basis 
vector,  namely  w1  =  ||  vx  ||  e1?  noting  that  ||  vx  ||  =  ||  w1 1|.  Assuming  v1  ^  ce1?  we  define 
the  first  unit  vector  and  corresponding  elementary  reflection  matrix  as 


ui  = 


vi  - 1 

vii 

ei 

vi  - 

vii 

eil 

Hi=  I 


'T' 

2u1u1 


On  the  other  hand,  if  v:  =  ce1  is  already  in  the  desired  form,  then  we  set  u:  =  0  and 
H1  =  I.  Since,  by  the  lemma,  H1v1  = 
obtain  a  matrix 


A2  =  H1  A  = 


/r'ol 
0 

V  o 


1?  when  we 

multiply  A 

a12 

al3 

■■■  «1  n\ 

a22 

a23 

a2n 

a32 

a33 

a3  n 

£9 

3 

to 

£9 

3 

CO 

a 

nn 

whose  first  column  is  in  the  desired  upper  triangular  form. 

In  the  next  phase,  we  construct  a  second  elementary  reflection  matrix  to  make  all  the 
entries  below  the  diagonal  in  the  second  column  of  A2  zero,  keeping  in  mind  that,  at  the 
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same  time,  we  should  not  mess  up  the  first  column.  The  latter  requirement  tells  us  that 
the  vector  used  for  the  reflection  should  have  a  zero  in  its  first  entry.  The  correct  choice  is 
to  set 

rri 

H2  =  I  —  2  u2  u2  . 


32,  •  •  •  ,  Un2  )  5 

U2  -  " 

’  2 

T  2 

2 

5 

V2 

V2 

e2 

e2,  then  u2  =  0  and  H2  - 

=  I. 

The  net  effect  is 

/  rn 

r12 

a 

13 

.  .  . 

aln  \ 

0 

r22 

a 

23 

.  .  . 

a2n 

^3  R  ? 

0 

0 

a 

33 

•  •  • 

a3n 

0 

0 

a 

n3 

'cl  ) 

nri 

and  now  the  first  two  columns  are  in  upper  triangular  form. 

The  process  continues;  at  the  kth  stage,  we  are  dealing  with  a  matrix  Ak  whose  first 
k  —  1  columns  coincide  with  the  first  k  columns  of  the  eventual  upper  triangular  matrix  R. 
Let  vk  denote  the  vector  obtained  from  the  kth  column  of  Ak  by  setting  its  initial  k  —  1 
entries  equal  to  0.  We  define  the  kth  Householder  vector  and  corresponding  elementary 
reflection  matrix  by 

wfc/ 


wfc  =  vfc  - 


k 


W 


k 


'fc> 


Ufc  = 


0, 


if  wfc  ~f~  0, 
if  Wfe  =  0, 


(4.38) 


Hk=l 


2ufc ufc- 


^ffc+l  ^k^k 


H 


The  process  is  completed  after  n  —  1  steps,  and  the  final  result  is 

R  =  Hn_lAn_l=Hn_lHn_2  ■■■  HlA  =  QTA,  where  Q  = 

is  an  orthogonal  matrix,  since  it  is  the  product  of  orthogonal  matrices,  cf.  Proposition  4.23. 
In  this  manner,  we  have  reproduced  a ^  QR  factorization  of 

A  =  QR  =  HlH2  •  •  •  Hn_xR.  (4.39) 


Example  4.29.  Let  us  implement  Householder’s  Method  on  the  particular  matrix 


A  = 


considered  earlier  in  Example  4.25.  The  first  Householder  vector 

l\  / 1  \  /  —.7321 

1  1  -  -n/31  0  1  =  I  1 


vi  = 


0 


-1 


leads  to  the  elementary  reflection  matrix 
.5774  .5774  -.5774 

.5774  .2113  .7887  I,  whereby  A2—H1A  — 


H  i  = 


.5774  .7887 


,2113 


-.5774 

2.1547 

-.1547 


-1.7321 

3.0981 

-2.0981 


^  The  upper  triangular  matrix  R  may  not  have  positive  diagonal  entries;  if  desired,  this  can  be 
easily  fixed  by  changing  the  signs  of  the  appropriate  columns  of  Q. 
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To  construct  the  second  and  final  Householder  matrix,  we  start  with  the  second  column  of 
A2  and  then  set  the  first  entry  to  0;  the  resulting  Householder  vector  is 


v2  = 


0 

2.1603  (  1 
0 


Therefore, 

/  i  0 

H2  =  0  .9974 

\0  -.0716 


°  ) 

—  .0716  I  ,  and  so 
-.9974  ) 


/ 1.7321 

R  =  H.2A,  =  I  0 

V  o 


-.5774  -1.7321  \ 

2.1603  3.2404 

0  1.8708  / 


is  the  upper  triangular  matrix  in  the  QR  decomposition  of  A.  The  orthogonal  matrix  Q 
is  obtained  by  multiplying  the  reflection  matrices: 


Q  =  h,h2 


(  .5774  .6172  .5345  \ 

.5774  .1543  -.8018  , 

y  —.5774  .7715  -.2673/ 


which  numerically  reconfirms  the  previous  factorization  (4.33). 


Remark.  If  the  purpose  of  the  QR  factorization  is  to  solve  a  linear  system  via  (4.34),  it 
is  not  necessary  to  explicitly  multiply  out  the  Householder  matrices  to  form  Q;  we  merely 
need  to  store  the  corresponding  unit  Householder  vectors  u1? . . . ,  un_1.  The  solution  to 

Ayi  —  Q Ryi  —  h  can  be  found  by  solving  Ryi  —  Hn_1Hn_2  •  •  •  H1  b  (4.40) 

by  Back  Substitution.  This  is  the  method  of  choice  for  moderately  ill-conditioned  systems. 
Severe  ill-conditioning  will  defeat  even  this  ingenious  approach,  and  accurately  solving  such 
systems  can  be  an  extreme  challenge. 


Exercises 

4.3.26.  Write  down  the  QR  matrix  factorization  corresponding  to  the  vectors  in  Example  4.17. 


4.3.27.  Find  the  QR  factorization  of  the  following  matrices:  (a) 


1 

2 


3 

1 


(b) 


2  1  -1\ 

°  i  2\ 

(  0  0  2\ 

(c) 

0  1  3 

,  (d) 

-111,  (e) 

0  4  1 

,  (f) 

V-1  -1  1 J 

V-1  1  3/ 

V-1  0  l) 

/ 1 
1 
1 
VI 


1 

2 

1 

0 


4 

3 

1 

1 

2 

1 


3 
2 

1\ 
0 
1 
17 


4.3.28.  For  each  of  the  following  linear  systems,  find  the  QR  factorization  of  the  coefficient 

-1  a)  (y)  =  (  2 

2  1  -1\  / x\  2\  1  1  0\  f x\  /0 


matrix,  and  then  use  your  factorization  to  solve  the  system:  ( i ) 


(m) 


1  0  2  y  I  =  -1  ,  (in)  -1  0  1 

2  -1  3 )  \z)  Oj  0  -1  1) 

4*  4.3.29.  Use  the  numerically  stable  version  of  the  Gram-Schmidt  process  to  find  the  QR 

factorizations  of  the  3  x  3,4  x  4  and  5x5  versions  of  the  tridiagonal  matrix  that  has  4’s 
along  the  diagonal  and  l’s  on  the  sub-  and  super-diagonals,  as  in  Example  1.37. 

0  4.3.30.  Prove  that  the  QR  factorization  of  a  matrix  is  unique  if  all  the  diagonal  entries  of  R 
are  assumed  to  be  positive.  Hint :  Use  Exercise  4.3.12. 
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O  4.3.31.  (a)  How  many  arithmetic  operations  are  required  to  compute  the  Q  R  factorization  of  an 

n  x  n  matrix?  (b)  How  many  additional  operations  are  needed  to  utilize  the  factorization 
to  solve  a  linear  system  Ax  =  b  via  (4.34)?  (c)  Compare  the  amount  of  computational 

effort  with  standard  Gaussian  Elimination. 


0  4.3.32.  Suppose  A  is  an  m  x  n  matrix  with  rank  A  =  n.  (a)  Show  that  applying  the  Gram- 

Schmidt  algorithm  to  the  columns  of  A  produces  an  orthonormal  basis  for  img  A  (b)  Prove 
that  this  is  equivalent  to  the  matrix  factorization  A  =  Q  R,  where  Q  is  an  m  x  n  matrix 
with  orthonormal  columns,  while  R  is  a  nonsingular  n  x  n  upper  triangular  matrix. 

(c)  Show  that  the  QR  program  in  the  text  also  works  for  rectangular,  m  x  n,  matrices  as 
stated,  the  only  modification  being  that  the  row  indices  i  run  from  1  to  m.  (d)  Apply  this 
method  to  factor 


fl  -1\ 

/  —3  2  \ 

(  ~ 1  1\ 

/  °  i  2  \ 

(0 

2  3 

,  (ii) 

1  -1 

,  (in) 

1  -2 
-1  -3 

,  (iv) 

-3  1  -1 

-1  0  -2 

\°  V 

V  4  1/ 

^  0  5  / 

\  1  1  -2/ 

(e)  Explain  what  happens  if  rank  A  <  n. 


O  4.3.33.  (a)  According  to  Exercise  4.2.14,  the  Gram-Schmidt  process  can  also  be  applied  to 

produce  orthonormal  bases  of  complex  vector  spaces.  In  the  case  of  Cn,  explain  how  this  is 
equivalent  to  the  factorization  of  a  nonsingular  complex  matrix  A  =  U  R  into  the  product  of 
a  unitary  matrix  U  (see  Exercise  4.3.25)  and  a  nonsingular  upper  triangular  matrix  R. 

(b)  Factor  the  following  complex  matrices  into  unitary  times  upper  triangular: 


\ 

( i 

1 

°\ 

(  i 

1 

-i  \ 

,  (in) 

i 

i 

1 

,  (iv) 

1  -  i 

0 

1  +  i 

! 

^0 

1 

i  / 

l  -1 

2  +  3  i 

i  ) 

(c)  What  can  you  say  about  uniqueness  of  the  factorization? 


4.3.34.  (a)  Write  down  the  Householder  matrices  corresponding  to  the  following  unit  vectors: 

(*)  (l,0f,  («)  (§>i)T’(“*)  (0, 1,0)T,  (iv)  (^j,0, -^=)  •  (b)  Find  all  vectors 

fixed  by  a  Householder  matrix,  i.e.,  Hw  =  v  —  first  for  the  matrices  in  part  (a),  and  then 
in  general,  (c)  Is  a  Householder  matrix  a  proper  or  improper  orthogonal  matrix? 


4.3.35.  Use  Householder’s  Method  to  solve  Exercises  4.3.27  and  4.3.29. 

4»  4.3.36.  Let  Hn  =  QnRn  be  the  QR  factorization  of  the  n  x  n  Hilbert  matrix  (1.72).  (a)  Find 
Qn  and  Rn  for  n  =  2,  3,4.  (b)  Use  a  computer  to  find  Qn  and  Rn  for  n  =  10  and  20. 

(c)  Let  x*  £  IRn  denote  the  vector  whose  entry  is  x\  =  (— I)1  i/(i  +  1).  For  the  values  of 
n  in  parts  (a)  and  (b),  compute  y*  =  iLnx*.  Then  solve  the  system  7Lnx  =  y*  (z)  directly 
using  Gaussian  Elimination;  (zz)  using  the  QR  factorization  based  on(4.34);  (zzz)  using 
Householder’s  Method.  Compare  the  results  to  the  correct  solution  x*  and  discuss  the  pros 
and  cons  of  each  method. 


4.3.37.  Write  out  a  pseudocode  program  to  implement  Householder’s  Method.  The  input 
should  be  an  n  x  n  matrix  A  and  the  output  should  be  the  Householder  unit  vectors 
u1? . . . ,  un-1  and  the  upper  triangular  matrix  R.  Test  your  code  on  one  of  the  examples  in 
Exercises  4.3.26-28. 


4.4  Orthogonal  Projections  and  Orthogonal  Subspaces 

Orthogonality  is  important,  not  just  for  individual  vectors,  but  also  for  subspaces.  In  this 
section,  we  develop  two  concepts.  First,  we  investigate  the  orthogonal  projection  of  a  vector 
onto  a  subspace,  an  operation  that  plays  a  key  role  in  least  squares  minimization  and  data 
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213 


Figure  4.4. 


The  Orthogonal  Projection  of  a  Vector  onto  a  Subspace. 


fitting,  as  we  shall  discuss  in  Chapter  5.  Second,  we  develop  the  concept  of  orthogonality 
for  a  pair  of  subspaces,  culminating  with  a  proof  of  the  orthogonality  of  the  fundamental 
subspaces  associated  with  an  m  x  n  matrix  that  at  last  reveals  the  striking  geometry  that 
underlies  linear  systems  of  equations  and  matrix  multiplication. 

Orthogonal  Projection 

Throughout  this  section,  W  C  V  will  be  a  finite-dimensional  subspace  of  a  real  inner 
product  space.  The  inner  product  space  V  is  allowed  to  be  infinite-dimensional.  But, 
to  facilitate  your  geometric  intuition,  you  may  initially  want  to  view  IT  as  a  subspace  of 
Euclidean  space  V  =  Mm  equipped  with  the  ordinary  dot  product. 

Definition  4.30.  A  vector  z  E  V  is  said  to  be  orthogonal  to  the  subspace  W  C  V  if  it  is 
orthogonal  to  every  vector  in  W,  so  ( z  ,  w )  =0  for  all  w  E  W. 

Given  a  basis  w1,...,wn  of  the  subspace  W,  we  note  that  z  is  orthogonal  to  W  if 
and  only  if  it  is  orthogonal  to  every  basis  vector:  { z ,  w  • )  =  0  for  i  =  1, . . . ,  n.  Indeed, 
any  other  vector  in  W  has  the  form  w  =  c1  wq  +  •  • •  +  cn  wn,  and  hence,  by  linearity, 
(  z  ,  w )  =  cx  ( z  ,  wq  )  +  •••  +  cn  ( z  ,  wn  )  =  0,  as  required. 

Definition  4.31.  The  orthogonal  projection  of  v  onto  the  subspace  W  is  the  element 
w  E  W  that  makes  the  difference  z  =  v  —  w  orthogonal  to  W. 

The  geometric  configuration  underlying  orthogonal  projection  is  sketched  in  Figure  4.4. 
As  we  shall  see,  the  orthogonal  projection  is  unique.  Note  that  v  =  w  +  z  is  the  sum  of 
its  orthogonal  projection  w  E  V  and  the  perpendicular  vector  z  T  W. 

The  explicit  construction  is  greatly  simplified  by  taking  an  orthonormal  basis  of  the 
subspace,  which,  if  necessary,  can  be  arranged  by  applying  the  Gram-Schmidt  process 
to  a  known  basis.  (The  direct  construction  of  the  orthogonal  projection  in  terms  of  a 
non-orthogonal  basis  appears  in  Exercise  4.4.10.) 

Theorem  4.32.  Let  u1? . . . ,  un  be  an  orthonormal  basis  for  the  subspace  W  C  V.  Then 
the  orthogonal  projection  of  v  E  V  onto  w  E  W  is  given  by 

w  =  ciui+  +cnun  where  c-  =  (v,u-),  z=l,...,n.  (4.41) 
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Proof :  First,  since  ul5...,un  form  a  basis  of  the  subspace,  the  orthogonal  projection 
element  must  be  some  linear  combination  thereof:  w  =  c1 u±  +  •  •  •  +  cnun.  Definition  4.31 
requires  that  the  difference  z  =  v  —  w  be  orthogonal  to  W,  and,  as  noted  above,  it  suffices 
to  check  orthogonality  to  the  basis  vectors.  By  our  orthonormality  assumption, 

0=  (z,^)  =  (v-w.u*)  =  (v-CjUi  -  •••  -cnun,Ui) 

=  (  V  ,  U;  }  -  Cl  (  Uj  ,  Ui  )  -  •  •  •  -  Cn  { un  ,  Ui  }  =  (  V  ,  Ui  }  -  ct. 

The  coefficients  ci  =  ( v ,  iq  )  of  the  orthogonal  projection  w  are  thus  uniquely  prescribed 
by  the  orthogonality  requirement,  which  thereby  proves  its  uniqueness.  Q.E.D. 


More  generally,  if  we  employ  an  orthogonal  basis  v1? . . . ,  vn  for  the  subspace  W,  then 
the  same  argument  demonstrates  that  the  orthogonal  projection  of  v  onto  W  is  given  by 


w  =  a1v1  +  •••  -hanvn, 


where 


(4.42) 


We  could  equally  well  replace  the  orthogonal  basis  by  the  orthonormal  basis  obtained  by 
dividing  each  vector  by  its  length:  iq  =  w/||  wi  ||.  The  reader  should  be  able  to  prove  that 
the  two  formulas  (4.41,  42)  for  the  orthogonal  projection  yield  the  same  vector  w. 

Example  4.33.  Consider  the  plane  W  C  M3  spanned  by  the  orthogonal  vectors 


vi  = 


T 

According  to  formula  (4.42),  the  orthogonal  projection  of  v  =  ( 1,  0,  0  )  onto  W  is 

<  v  .  v«  >  i  / 

W  =  - - tt-  V1  + 


(v,vx 


Alternatively,  we  can  replace  v1?  v2  by  the  orthonormal  basis 


ui  = 


_ 2_ 

Vq 

i 

V  V6  / 


u2  = 


V3  ^ 

1 

V73/ 


Then,  using  the  orthonormal  version  (4.41) 


w  =  (v^ui>  ui  +  (v,u^ 


1 


^  vT  ^ 


U2  = 


2 

vF 


V  Ve  / 


+ 


1 


V3 


1 

73 

V7f  / 


/A 

0 

U 


The  answer  is,  of  course,  the  same.  As  the  reader  may  notice,  while  the  theoretical  formula 
is  simpler  when  written  in  an  orthonormal  basis,  for  hand  computations  the  orthogonal 
basis  version  avoids  having  to  deal  with  square  roots.  (Of  course,  when  the  numerical 
computation  is  performed  on  a  computer,  this  is  not  a  significant  issue.) 

An  intriguing  observation  is  that  the  coefficients  in  the  orthogonal  projection  formulas 
(4.41-42)  coincide  with  the  formulas  (4.4,  7)  for  writing  a  vector  in  terms  of  an  orthonormal 
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or  orthogonal  basis.  Indeed,  if  v  were  an  element  of  W,  then  it  would  coincide  with  its 
orthogonal  projection,  w  =  v.  (Why?)  As  a  result,  the  orthogonal  projection  formula 
include  the  orthogonal  basis  formula  as  a  special  case. 

It  is  also  worth  noting  that  the  same  formulae  occur  in  the  Gram-Schmidt  algorithm, 
cf.  (4.19).  This  observation  leads  to  a  useful  geometric  interpretation  of  the  Gram-Schmidt 
construction.  For  each  k  —  1, . . . ,  n,  let 

Wk  =  span  {w1; . . . ,  wj  =  span  {vl5 . . . ,  vfe}  =  span  {u1; . . . ,  uj  (4.43) 

denote  the  ^-dimensional  subspace  spanned  by  the  first  k  basis  elements,  which  is  the 
same  as  that  spanned  by  their  orthogonalized  and  orthonormalized  counterparts.  In  view 
of  (4.41),  the  basic  Gram-Schmidt  formula  (4.19)  can  be  re-expressed  in  the  form  vfc  = 
wfc  — Pfc,  where  pk  is  the  orthogonal  projection  of  wk  onto  the  subspace  Wk_1.  The  resulting 
vector  vfc  is,  by  construction,  orthogonal  to  the  subspace,  and  hence  orthogonal  to  all  of 
the  previous  basis  elements,  which  serves  to  rejustify  the  Gram-Schmidt  construction. 


Exercises 

Note :  Use  the  dot  product  and  Euclidean  norm  unless  otherwise  specified. 


4.4.1.  Determine  which  of  the  vectors  v1  = 


(~2\ 

(  2  \ 

1  .  v2  = 

2 

’  v3  = 

-1 

>v4  = 

3 

Vo; 

V  2  J 

V-3  J 

v  4 ; 

,  is 


o 

i 

(  l\ 

orthogonal  to  (a)  the  line  spanned  by 

3 

;  (b)  the  plane  spanned  by 

-i 

5 

i 

V  -2y 

V  o 

U/ 

(c)  the  plane  defined  by  x  —  y  —  z  =  0;  (d)  the  kernel  of  the  matrix  ^ 


1  -1  -1 
3  -2  -4 


(  —3 

n 

(  - 1 

0 

3  \ 

(e)  the  image  of  the  matrix 

3 

-i 

;  (f)  the  cokernel  of  the  matrix 

2 

1 

-2 

^-1 

o) 

^  3 

1 

-5/ 

T 

4.4.2.  Find  the  orthogonal  projection  of  the  vector  v  =  (1,1,1)  onto  the  following  subspaces, 
using  the  indicated  orthonormal/orthogonal  bases:  (a)  the  line  in  the  direction 

;  (b)  the  line  spanned  by  (  2,  — 1,  3  )T;  (c)  the  plane  spanned  by 

(1, 1>0)T  ,(-2,2, 1  )T;  (d)  the  plane  spanned  by  ( |,o)T  ,  (  -j§  )T. 

T 

4.4.3.  Find  the  orthogonal  projection  of  v  =  (1,2,  —1,  2  )  onto  the  following  subspaces: 


(a)  the  span  of 


of  the  matrix 


/  1\ 

(  2  \ 

-1 

1 

2 

5 

0 

V  i  J 

^-1/ 

1  - 

1 

0  1 

-2 

1 

1  0 

;  (b)  the  image  of  the  matrix 


/  1 

2  \ 

-1 

1 

0 

3 

^-1 

;  (c)  the  kernel 


T 

;  (d)  the  subspace  orthogonal  to  a=  (1,— 1,0, 1)  . 


Warning.  Make  sure  you  have  an  orthogonal  basis  before  applying  formula  (4.42)! 


4.4.4.  Find  the  orthogonal  projection  of  the  vector 


/ 1\ 
2 

w 


onto  the  image  of 


T 

4.4.5.  Find  the  orthogonal  projection  of  the  vector  v  =  (1,3,  —1)  onto  the  plane  spanned 

T  T 

by  (  —  1,  2, 1 )  ,(2,1,  —3  )  by  first  using  the  Gram-Schmidt  process  to  construct  an 

orthogonal  basis. 


216 


4  Orthogonality 


rj~] 

4.4.6.  Find  the  orthogonal  projection  ofv  =  (1,2,— 1,2)  onto  the  span  of  (1,  — 1,2,5)^  and 


T 


T 

(  2, 1,  0,  —  1 )  using  the  weighted  inner  product  ( v  ,  w  )  =  4 v1w1  +  3  v 2  w2  +  2 ie3  +  v4 


w 


4- 


4.4.7.  Redo  Exercise  4.4.2  using 

( i )  the  weighted  inner  product  ( v  ,  w )  =  2 P  2  v2w2  -j-  v3w3; 

(ii)  the  inner  product  induced  by  the  positive  definite  matrix  K  = 


/  2 
-1 

V  o 


0\ 
-1 
2  ) 


4.4.8.  (a)  Prove  that  the  set  of  all  vectors  orthogonal  to  a  given  subspace  V  C  Mm  forms 

a  subspace,  (b)  Find  a  basis  for  the  set  of  all  vectors  in  IR4  that  are  orthogonal  to  the 

T  T 

subspace  spanned  by  ( 1,  2,  0,  —1 )  ,  (  2,  0,  3, 1 )  . 


G  4.4.9.  Let  u Ufc  be  an  orthonormal  basis  for  the  subspace  W  C  IRm.  Let 

A  =  ( u1  u2  . . .  uk  )  be  the  m  x  k  matrix  whose  columns  are  the  orthonormal  basis  vectors, 

rp 

and  define  P  =  AA  to  be  the  corresponding  projection  matrix,  (a)  Given  v  £  IRn,  prove 
that  its  orthogonal  projection  w  £  W  is  given  by  matrix  multiplication:  w  =  Fv. 


rp  ry 

(b)  Prove  that  P  =  P  is  symmetric,  (c)  Prove  that  P  is  idempotent:  P  =  P.  Give 
a  geometrical  explanation  of  this  fact,  (d)  Prove  that  rankP  =  k.  (e)  Write  out  the 
projection  matrix  corresponding  to  the  subspaces  spanned  by 


/  J_\ 

(  2 

3 

(  M 

U6 

(0 

F  ,  (**) 

Kn) 

2 

3 

V  V 

,  (in) 

2 

a/6 

1 

V  Ve  ' 

a/3 

1 

a/3 

V73  ) 


U 

u 

h\ 

1 

1 

l 

2 

2 

2 

1 

? 

1 

1 

2 

2 

2 

v-P 

\> 

P 

G  4.4.10.  Let  w1? . . . ,  wn  be  an  arbitrary  basis  of  the  subspace  W  C  IRm.  Let  A  =  (w1? . . . ,  wn) 
be  the  m  x  n  matrix  whose  columns  are  the  basis  vectors,  so  that  W  =  imgA  and 

rp  -i  rp 

rank  A  =  n.  (a)  Prove  that  the  corresponding  projection  matrix  P  =  A(A  A)  ~  A 
is  idempotent:  P 2  =  P.  (b)  Prove  that  P  is  symmetric,  (c)  Prove  that  img  P  =  W. 

(d)  (e)  Prove  that  the  orthogonal  projection  of  v  £  Mn  onto  w  £  W  is  obtained  by 
multiplying  by  the  projection  matrix:  w  =  Fv.  (f)  Show  that  if  A  is  nonsingular,  then 
P  =  I.  How  do  you  interpret  this  in  light  of  part  (e)?  (g)  Explain  why  Exercise  4.4.9  is 
a  special  case  of  this  result,  (h)  Show  that  if  A  =  QR  is  the  factorization  of  A  given  in 

Exercise  4.3.32,  then  P  =  QQT .  Why  is  P  /  I? 


4.4.11.  Use  the  projection  matrix  method  of  Exercise  4.4.10  to  find  the  orthogonal  projection 

T 

of  v  =  (1,0, 0,0)  onto  the  image  of  the  following  matrices: 


5  \ 

/  1  0\ 

2  -1\ 

/  0  1  -1\ 

(a) 

-5 

-7 

,  ( b ) 

-1  2 

0  -1 
^  1  2) 

.  (c) 

-3  1 

1  -2 

1  2  / 

.  (d) 

0-1  2 

111 
1-2  -1  0 ) 

Orthogonal  Subspaces 

We  now  extend  the  notion  of  orthogonality  from  individual  elements  to  entire  subspaces 
of  an  inner  product  space  V. 

Definition  4.34.  Two  subspaces  W,  Z  C  V  are  called  orthogonal  if  every  vector  in  W  is 
orthogonal  to  every  vector  in  Z. 

In  other  words,  W  and  Z  are  orthogonal  subspaces  if  and  only  if  ( w ,  z )  =  0  for  every 
w  £  W  and  z  £  Z.  In  practice,  one  only  needs  to  check  orthogonality  of  basis  elements, 
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Figure  4.5.  Orthogonal  Complement  to  a  Line. 


or,  more  generally,  spanning  sets. 

Lemma  4.35.  If  w1? . . . ,  wfc  span  W  and  z1? . . . ,  zl  span  Z,  then  W  and  Z  are  orthogonal 
subspaces  if  and  only  if  ( w, ,  )  =  0  for  alM  =  1, . . . ,  k  and  j  =  1, . . . ,  l. 

The  proof  of  this  lemma  is  left  to  the  reader;  see  Exercise  4.4.26. 

Example  4.36.  Let  V  =  M3  have  the  ordinary  dot  product.  Then  the  plane  W  C  M3 
defined  by  the  equation  2x  —  y  +  3z  =  0  is  orthogonal  to  the  line  Z  spanned  by  its  normal 

vector  n  =  (2,— 1,3)  .  Indeed,  every  w  =  (sc,  y,  z)  E  W  satisfies  the  orthogonality 
condition  w  •  n  =  2r  -  y  -j-  3z  =  0,  which  is  simply  the  equation  for  the  plane. 

Example  4.37.  Let  W  be  the  span  of  w1  =  ( 1,  —2,  0, 1  )T  ,  w2  =  ( 3,  —5,  2, 1  )T,  and 

let  Z  be  the  span  of  the  vectors  zx  =  (3,2,0, 1)T,  z2  =  ( 1,  0,  — 1,  — 1  )T.  We  find  that 
wy  •  zx  =  wx  •  z2  =  w2  •  z1  —  w2  •  z2  =  0,  and  so  II  and  Z  are  orthogonal  two-dimensional 

subspaces  of  M4  under  the  Euclidean  dot  product. 


Definition  4.38.  The  orthogonal  complement  of  a  subspace  W  C  V,  denoted^  W1- 
defined  as  the  set  of  all  vectors  that  are  orthogonal  to  W : 


is 


W1'  =  {v  E  V  \  ( v  ,  w )  =  0  for  all  w  E  W  } 


(4.44) 


If  W  is  the  one-dimensional  subspace  (line)  spanned  by  a  single  vector  w/0  then  we 
also  denote  W1-  by  wx,  as  in  (4.36).  One  easily  checks  that  the  orthogonal  complement 
W1-  is  also  a  subspace.  Moreover,  W  D  W1-  —  {0}.  (Why?)  Keep  in  mind  that  the 
orthogonal  complement  will  depend  upon  which  inner  product  is  being  used. 


Example  4.39.  Let  W  =  {  ( £,  2 1,  3t  )T 
in  the  direction  of  the  vector  wy  =  (1,2,3 


t  E  R  }  be  the  line  (one-dimensional  subspace) 

\T  o 

)  E  R  .  Under  the  dot  product,  its  orthogonal 


And  usually  pronounced  “W  perp” 
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Figure  4.6.  Orthogonal  Decomposition  of  a  Vector. 


complement  W1-  =  is  the  plane  passing  through  the  origin  having  normal  vector  w1? 

T  i 

as  sketched  in  Figure  4.5.  In  other  words,  z  =  z)  E  W ^  if  and  only  if 


z  •  w1  =  x  2y  3z  =  0.  (4.45) 

Thus,  W1-  is  characterized  as  the  solution  space  of  the  homogeneous  linear  equation  (4.45), 
or,  equivalently,  the  kernel  of  the  1x3  matrix  A  —  =  (1  2  3).  We  can  write  the 

general  solution  in  the  form 


z  = 


2y-3z\ 

:  ) 


=  y  z1  T  z  z2 , 


where  y,  z  are  the  free  variables.  The  indicated  vectors  z:  =  (  —2, 1,  0  )T,  z2  =  (  —3,  0, 1  )T, 
form  a  (non-orthogonal)  basis  for  the  orthogonal  complement  W  . 


Proposition  4.40.  Suppose  that  W  C  V  is  a  finite-dimensional  subspace  of  an  inner 
product  space.  Then  every  vector  v  E  V  can  be  uniquely  decomposed  into  v  =  w  +  z, 
where  w  E  W  and  z  E  W±. 

Proof :  We  let  w  E  W  be  the  orthogonal  projection  of  v  onto  W.  Then  z  =  v  —  w  is, 
by  definition,  orthogonal  to  W  and  hence  belongs  to  W^.  Note  that  z  can  be  viewed 
as  the  orthogonal  projection  of  v  onto  the  complementary  subspace  W1-  (provided  it  is 
finite-dimensional).  If  we  are  given  two  such  decompositions,  v  =  w  +  z  =  w  +  z,  then 
w  —  w  =  z  —  z.  The  left-hand  side  of  this  equation  lies  in  W,  while  the  right-hand  side 
belongs  to  W^.  But,  as  we  already  noted,  the  only  vector  that  belongs  to  both  W  and 
W1-  is  the  zero  vector.  Thus,  w  —  w  =  0  =  z  —  z,  so  w  =  w  and  z  =  z,  which  proves 
uniqueness.  Q.E.D. 

As  a  direct  consequence  of  Exercise  2.4.26,  in  a  finite-dimensional  inner  product  space, 
a  subspace  and  its  orthogonal  complement  have  complementary  dimensions: 

Proposition  4.41.  If  W  C  V  is  a  subspace  with  dim  IT  =  n  and  dimV  =  m,  then 
dim  W±  =  rn  —  n. 


Example  4.42.  Return  to  the  situation  described  in  Example  4.39.  Let  us  decompose 
the  vector  v  =  ( 1,  0,  0 )  E  M3  into  a  sum  v  =  w  +  z  of  a  vector  w  lying  on  the  line  W 
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and  a  vector  z  belonging  to  its  orthogonal  plane  defined  by  (4.45).  Each  is  obtained 
by  an  orthogonal  projection  onto  the  subspace  in  question,  but  we  only  need  to  compute 
one  of  the  two  directly,  since  the  second  can  be  obtained  by  subtracting  the  first  from  v. 

Orthogonal  projection  onto  a  one-dimensional  subspace  is  easy,  since  every  basis  is, 
trivially,  an  orthogonal  basis.  Thus,  the  projection  of  v  onto  the  line  spanned  by 


wi  —  (1,2,3) 


is 


w  = 


(v,w- 


w 


—  w , 
2  1 


/  J_  _2_  J3_  \T 
V  14’  14’  14  / 


The  component  in  W1- 


is  then  obtained  by  subtraction: 


Z  =  v-w=  --2.  --MT 

Alternatively,  one  can  obtain  z  directly  by  orthogonal  projection  onto  the  plane  W^.  But 
you  need  to  be  careful:  the  basis  found  in  Example  4.39  is  not  orthogonal,  and  so  you  will 
need  to  either  first  convert  to  an  orthogonal  basis  and  then  use  the  orthogonal  projection 
formula  (4.42),  or  apply  the  more  direct  result  in  Exercise  4.4.10. 


Example  4.43.  Let  W  C  M4  be  the  two-dimensional  subspace  spanned  by  the  orthog¬ 
onal  vectors  wq  =  ( 1, 1, 0, 1 )  and  w2  =  (1,1, 1,  — 2  )  Its  orthogonal  complement  W1- 

T 

(with  respect  to  the  Euclidean  dot  product)  is  the  set  of  all  vectors  v  =  (x,y,  z,w)  that 
satisfy  the  linear  system 

v  •  wx  =  x  y  w  —  0,  v  -  w2  —  x  y  z  —  2w  —  0. 


Applying  the  usual  algorithm  —  the  free  variables  are  y  and  w  —  we  find  that  the  solution 
space  is  spanned  by 


z1  =  (-1,1,0, of,  z2  =  (-1,0,3,  if , 
which  form  a  non-orthogonal  basis  for  .  An  orthogonal  basis 


yi  =  zi  =  (-1,1,0,  of, 


for  W1-  is  obtained  by  a  single  Gram-Schmidt  step.  To  decompose  the  vector  v  = 

T 

(1,0, 0,0)  =w  +  z,  say,  we  compute  the  two  orthogonal  projections: 


w  =  lwi  +  !w2  =  (if> if, fir) 

z  =  v-w  =  -iy1-iy2  =  A,-3t)T  g  w±- 


Proposition  4.44.  If  W  is  a  finite-dimensional  subspace  of  an  inner  product  space,  then 

(' W -L)-L  =  w. 


This  result  is  a  corollary  of  the  orthogonal  decomposition  derived  in  Proposition  4.40, 


Warning.  Propositions  4.40  and  4.44  are  not  necessarily  true  for  infinite-dimensional  sub¬ 
spaces.  If  dimfiE  =  oc,  one  can  assert  only  that  W  C  ( W _L)±.  For  example,  it  can  be 
shown,  [19;  Exercise  10.2. D],  that  on  every  bounded  interval  [a,  b]  the  orthogonal  com¬ 
plement  of  the  subspace  of  all  polynomials  7^°°)  c  C°[a,  b]  with  respect  to  the  L2  inner 
product  is  trivial:  (T^00))^  =  {0}.  This  means  that  the  only  continuous  function  that 
satisfies 


(xn,f(x))=  f  xnf{x)dx  —  0  for  all  n  —  0,1,2,... 

J  a 
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is  the  zero  function  f(x)  =  0.  But  the  orthogonal  complement  of  {0}  is  the  entire  space, 
and  so  ((7^°°))^)^  =  C°[a,  b }  ^  p(°°\ 


The  difference  is  that,  in  infinite-dimensional  function  space,  a  proper  subspace  W  C  V 
can  be  dense t,  whereas  in  finite  dimensions,  every  proper  subspace  is  a  “thin”  subset  that 
occupies  only  an  infinitesimal  fraction  of  the  entire  vector  space.  However,  this  seeming 
paradox  is,  interestingly,  the  reason  behind  the  success  of  numerical  approximation  schemes 
in  function  space,  such  as  the  finite  element  method,  [81 


Exercises 

Note :  In  Exercises  4.4.12-15,  use  the  dot  product. 


4.4.12.  Find  the  orthogonal  complement  IT4-  of  the  subspaces  IT  C  spanned  by  the 


indicated  vectors.  What  is  the  dimension  of  in  each  case? 


(  3\ 

(a) 

-1 

.  (t>) 

2 1 

0 

V 

V  3  / 

d/ 

.  (c) 


/1\ 

2 

w 


(2\ 

4 

W 


.  (d) 


•2  \ 
3 


/  — 1  \ 
2 


1/  V  o  J 


(e) 


fl\ 

1 

W 


(1\ 

0 

W 


4.4.13.  Find  a  basis  for  the  orthogonal  complement  of  the  following  subspaces  of 


3:  (a)  the 

plane  3x  +  Ay  —  5z  =  0;  (b)  the  line  in  the  direction  (  —2, 1,  3  )T;  (c)  the  image  of  the 


matrix 


/ 


V 


1  2 
2  0 
1  2 


-1  3) 

2  1 
1  4  J 


;  (d)  the  cokernel  of  the  same  matrix. 


4.4.14.  Find  a  basis  for  the  orthogonal  complement  of  the  following  subspaces  of  IR4:  (a)  the 

set  of  solutions  to—  x  -\-  3y  —  2  z  +  w  =  0;  (b)  the  subspace  spanned  by  ( 1,  2,  —1,  3)  , 

(  —  2,  0, 1,  —  2  )T,  (  —  1,  2,  0, 1  )T;  (c)  the  kernel  of  the  matrix  in  Exercise  4.4.13c;  (d)  the 
coimage  of  the  same  matrix. 

4.4.15.  Decompose  each  of  the  following  vectors  with  respect  to  the  indicated  subspace  as 


(b)  v  = 


where  w  G  IT,  z  G  W ±.  (a) 

V  = 

i\ 

( 

(  - 3\ 

2 

,  IT  =  span  < 

2 

5 

0 

l 

V  i ) 

^  5J 

;  (c)  v  = 


(d)  v  = 


/ 1  \ 
0 

W 


,  W  =  img 


/ 


V 


1 

2 

1 


0 


1\ 


1  0 
3  — 5  / 


;  (e)  V  = 


/1\ 
0 
0 

Vi/ 


( i\ 

0 

Vo/ 


,  W  =  ker 


1 

2 


2 

0 


,  W  =  ker 


1 

2 


0  0 
1  1 


1 

2 

2 

-3 


4.4.16.  Redo  Exercise  4.4.12  using  the  weighted  inner  product  (v,w)  =  ty  w-^  +2v2  xu2  +3f3  re3 
instead  of  the  dot  product. 

4.4.17.  Redo  Example  4.43  with  the  dot  product  replaced  by  the  weighted  inner  product 

( v  ,  w )  =  v1 w1  +  2v2  w2  +  3f3  ie3  +  4 v4  re4. 


0  4.4.18.  Prove  that  the  orthogonal  complement  of  a  subspace  W  C  V  is  itself  a  subspace. 


^  In  general,  a  subset  W  C  V  of  a  normed  vector  space  is  dense  if,  for  every  v  £  V,  and  every 
e  >  0,  one  can  find  w  £  W  with  || v  —  w||  <  e.  The  Weierstrass  Approximation  Theorem,  [19; 
Theorem  10.2.2],  tells  us  that  the  polynomials  form  a  dense  subspace  of  the  space  of  continuous 
functions,  and  underlies  the  proof  of  the  result  mentioned  in  the  preceding  paragraph. 
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4.4.19.  Let  V  =  denote  the  space  of  quartic  polynomials,  with  the  L2  inner  product 
(p,q)  =  J  p(x)  q(x)  dx.  Let  W  =  V ^  be  the  subspace  of  quadratic  polynomials. 

(a)  Write  down  the  conditions  that  a  polynomial  p  E  V ^  must  satisfy  in  order  to  belong 
to  the  orthogonal  complement  .  (b)  Find  a  basis  for  and  the  dimension  of  . 

(c)  Find  an  orthogonal  basis  for  W^. 


4.4.20.  Let  W  C  V.  Prove  that  (a)  W  n  W1-  =  {0},  (b)  W  C  (W^. 

4.4.21.  Let  V  be  an  inner  product  space.  Prove  that  (a)  V  ±  =  {o},  (b)  {op  =  V. 


4.4.22.  Prove  that  if  W1  C  W2  are  finite-dimensional  subspaces  of  an  inner  product  space, 
then  Wp  DWp. 

4.4.23.  (a)  Show  that  if  W,  Z  C  R"  are  complementary  subspaces,  then  W±  and  Z1'  are  also 
complementary  subspaces,  (b)  Sketch  a  picture  illustrating  this  result  when  W  and  Z  are 
lines  in  IR2. 

4.4.24.  Prove  that  if  W,  Z  are  subspaces  of  an  inner  product  space,  then  (FF+Z)^  =  W ±  n  Z^~. 
(See  Exercise  2.2.22(b)  for  the  definition  of  the  sum  of  two  subspaces.) 

0  4.4.25.  Fill  in  the  details  of  the  proof  of  Proposition  4.44. 

0  4.4.26.  Prove  Lemma  4.35. 

0  4.4.27.  Let  FF  C  V  with  dimP  =  n.  Suppose  wl5 . . . ,  wm  is  an  orthogonal  basis  for  W  and 
wm+1, . . . ,  wn  is  an  orthogonal  basis  for  W _L.  (a)  Prove  that  the  combination  wl5 . . . ,  wn 
forms  an  orthogonal  basis  of  V.  (b)  Show  that  if  v  =  c1  w1  +  •  •  •  +  cn  wn  is  any  vector  in 
V,  then  its  orthogonal  decomposition  v  =  w  +  z  is  given  by  w  =  c1  w1  +  •  •  •  +  wm  E  W 
and  z  •  •  •  4-  cn  wn  E  VF 

V  4.4.28.  Consider  the  subspace  W  =  {u(a)  =  0  =  u(b)  }  of  the  vector  space  C°[a,6]  with  the 
usual  L2  inner  product,  (a)  Show  that  W  has  a  complementary  subspace  of  dimension  2. 

(b)  Prove  that  there  does  not  exist  an  orthogonal  complement  of  FF.  Thus,  an  infinite¬ 
dimensional  subspace  may  not  admit  an  orthogonal  complement! 


Orthogonality  of  the  Fundamental  Matrix  Subspaces 

and  the  Fredholm  Alternative 

In  Chapter  2,  we  introduced  the  four  fundamental  subspaces  associated  with  an  m  x  n 
matrix  A.  According  to  the  Fundamental  Theorem  2.49,  the  first  two,  the  kernel  (null 
space)  and  the  coimage  (row  space),  are  subspaces  of  Mn  having  complementary  dimensions. 
The  second  two,  the  cokernel  (left  null  space)  and  the  image  (column  space),  are  subspaces 
of  Mm,  also  of  complementary  dimensions.  In  fact,  more  than  this  is  true  —  the  paired 
subspaces  are  orthogonal  complements  with  respect  to  the  standard  Euclidean  dot  product! 

Theorem  4.45.  Let  A  be  a  real  mxn  matrix.  Then  its  kernel  and  coimage  are  orthogonal 
complements  as  subspaces  of  Mn  under  the  dot  product,  while  its  cokernel  and  image  are 
orthogonal  complements  in  Mm,  also  under  the  dot  product: 

ker  A  =  (coimg  A)x  C  Mn,  coker  A  =  (img  A)^  C  Mm.  (4.46) 

Proof :  A  vector  x  E  Mn  lies  in  ker  A  if  and  only  if  Ax  =  0.  According  to  the  rules  of 
matrix  multiplication,  the  zth  entry  of  Ax  equals  the  vector  product  of  the  zth  row  rf  of 


222 


4  Orthogonality 


A 


Figure  4.7.  The  Fundamental  Matrix  Subspaces. 

A  and  x.  But  this  product  vanishes,  rf  x  =  •  x  =  0,  if  and  only  if  x  is  orthogonal 

to  ri.  Therefore,  x  E  ker  A  if  and  only  if  x  is  orthogonal  to  all  the  rows  of  A.  Since  the 
rows  span  coimgA,  this  is  equivalent  to  x  lying  in  its  orthogonal  complement  (coimg  A)^, 
which  proves  the  first  statement.  Orthogonality  of  the  image  and  cokernel  follows  by  the 
same  argument  applied  to  the  transposed  matrix  AT .  Q.E.D. 

Combining  Theorems  2.49  and  4.45,  we  deduce  the  following  important  characterization 
of  compatible  linear  systems. 

Theorem  4.46.  A  linear  system  Ax  =  b  has  a  solution  if  and  only  if  b  is  orthogonal  to 
the  cokernel  of  A. 

Indeed,  the  system  has  a  solution  if  and  only  if  the  right-hand  side  belongs  to  the  image 
of  the  coefficient  matrix,  b  E  img  A,  which,  by  (4.46),  requires  that  b  be  orthogonal  to  its 
cokernel.  Thus,  the  compatibility  conditions  for  the  linear  system  A  x  =  b  can  be  written 
in  the  form 

y  •  b  =  0  for  every  y  satisfying  ATy  =  0.  (4.47) 

In  practice,  one  only  needs  to  check  orthogonality  of  b  with  respect  to  a  basis  y1? . . . ,  ym_r 
of  the  cokernel,  leading  to  a  system  of  m  —  r  compatibility  constraints 

y  •  •  b  =  0,  i  —  1, . . . ,  m  —  r.  (4.48) 

Here  r  =  rank  A  denotes  the  rank  of  the  coefficient  matrix,  and  so  m  —  r  is  also  the  number 
of  all  zero  rows  in  the  row  echelon  form  of  A.  Hence,  (4.48)  contains  precisely  the  same 
number  of  constraints  as  would  be  derived  using  Gaussian  Elimination. 

Theorem  4.46  is  known  as  the  Fredholm  alternative ,  named  after  the  Swedish  mathe¬ 
matician  Ivar  Fredholm.  His  primary  motivation  was  to  solve  linear  integral  equations,  but 
his  compatibility  criterion  was  recognized  to  be  a  general  property  of  linear  systems,  includ¬ 
ing  linear  algebraic  systems,  linear  differential  equations,  linear  boundary  value  problems, 
and  so  on. 

In  Example  2.40,  we  analyzed  the  linear  system  A  x  =  b  with  coefficient 
0  -l\ 

1  —21.  Using  direct  Gaussian  Elimination,  we  were  led  to  a  single 

-2  3/ 

compatibility  condition,  namely  —  bx  +  2b2  +  b3  =  0,  required  for  the  system  to  have  a 
solution.  We  now  understand  the  meaning  behind  this  equation:  it  is  telling  us  that  the 
right-hand  side  b  must  be  orthogonal  to  the  cokernel  of  A.  The  cokernel  is  determined  by 


matrix  A  = 


1 
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solving  the  homogeneous  adjoint  system  AT  y  =  0,  and  is  the  line  spanned  by  the  vector 
y1  =  (—1,2, 1)T.  Thus,  the  compatibility  condition  requires  that  b  be  orthogonal  to  y1? 
in  accordance  with  the  Fredholm  alternative  (4.48). 

Example  4.48.  Let  us  determine  the  compatibility  conditions  for  the  linear  system 

x1  —  x2  +  3x3  =  61?  —  xx  +  2x2  —  4x3  =  b2,  2xx  +  3x2  +  x3  =  &3,  x1  +  2x3  =  &4, 

by  computing  the  cokernel  of  its  coefficient  matrix 

1  3\ 

2  -4 

3  1  ' 

0  2/ 

We  need  to  solve  the  homogeneous  adjoint  system  AT y  =  0,  namely 

V\  ~  V2  +  ^  Vs  +  2/4  =  0,  —  2/i  +  2 1/2  +  3 y3  =  0,  3 y1  —  4 y2  +  y3  +  2 1/4  =  0. 

Applying  Gaussian  Elimination,  we  deduce  that  the  general  solution 

y  =  2/3  (-7,-5,  l,0)r  +  y4  (-2, -1,0, if 

is  a  linear  combination  (whose  coefficients  are  the  free  variables)  of  the  two  basis  vectors 
for  coker  A.  Thus,  the  Fredholm  compatibility  conditions  (4.48)  are  obtained  by  taking 
their  dot  products  with  the  right-hand  side  of  the  original  system: 

—  7  b^  —  3b2  T  b3  —  0,  —  2  b^  —  b2  T  b3  —  0. 

The  reader  can  check  that  these  are  indeed  the  same  compatibility  conditions  that  result 
from  a  direct  Gaussian  Elimination  on  the  augmented  matrix  (A  |  b ) . 

Remark.  Conversely,  rather  than  solving  the  homogeneous  adjoint  system,  we  can  use 
Gaussian  Elimination  on  the  augmented  matrix  (A  |  b )  to  determine  the  m  —  r  basis 
vectors  y1? . . . ,  ym_r  for  coker  A.  They  are  formed  from  the  coefficients  of  b1: . . . ,  in 
the  m  —  r  consistency  conditions  yi  •  b  =  0  for  i  —  1, . . . ,  m  —  r,  arising  from  the  all  zero 
rows  in  the  reduced  row  echelon  form. 

We  are  now  very  close  to  a  full  understanding  of  the  fascinating  geometry  that  lurks 
behind  the  simple  algebraic  operation  of  multiplying  a  vector  x  E  Mn  by  an  m  x  n  matrix, 
resulting  in  a  vector  b  =  Ax  E  Mm.  Since  the  kernel  and  coimage  of  A  are  orthogonal 
complements  in  the  domain  space  Mn,  Proposition  4.41  tells  us  that  we  can  uniquely 
decompose  x  =  w  +  z,  where  w  E  coimg A,  while  z  E  ker  A.  Since  Az  =  0,  we  have 

b  —  Ax  =  A(w  +  z)  =  Aw. 

Therefore,  we  can  regard  multiplication  by  A  as  a  combination  of  two  operations: 

(2)  The  first  is  an  orthogonal  projection  onto  the  coimage  of  A  taking  x  to  w. 

(ii)  The  second  maps  a  vector  in  coimg  A  C  Mn  to  a  vector  in  imgA  C  Mm,  taking  the 
orthogonal  projection  w  to  the  image  vector  b  =  Aw  =  Ax. 

Moreover,  if  A  has  rank  r,  then  both  imgA  and  coimg  A  are  r-dimensional  subspaces, 
albeit  of  different  vector  spaces.  Each  vector  b  E  imgA  corresponds  to  a  unique  vector 
w  E  coimg  A.  Indeed,  if  w,  w  E  coimg  A  satisfy  b  =  Aw  =  Aw,  then  A(w  —  w)  =  0,  and 
hence  w  —  w  E  ker  A.  But,  since  the  kernel  and  the  coimage  are  orthogonal  complements, 


A  = 


1 

-1 

2 
1 
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the  only  vector  that  belongs  to  both  is  the  zero  vector,  and  hence  w  =  w.  In  this  manner, 
we  have  proved  the  first  part  of  the  following  result;  the  second  is  left  as  Exercise  4.4.38. 

Theorem  4.49.  Multiplication  by  an  m  x  n  matrix  A  of  rank  r  defines  a  one-to-one 
correspondence  between  the  r-dimensional  subspaces  coimgA  C  Mn  and  img  A  C  Mm. 
Moreover,  if  v1? . . . ,  vr  forms  a  basis  of  coimgA  then  their  images  dv1,...,4vr  form  a 
basis  for  img  A. 

In  summary,  the  linear  system  Ax.  —  b  has  a  solution  if  and  only  if  b  E  img  A,  or, 
equivalently,  is  orthogonal  to  every  vector  y  E  coker  A.  If  the  compatibility  conditions 
hold,  then  the  system  has  a  unique  solution  w  E  coimgA  that,  by  the  definition  of  the 
coimage,  is  a  linear  combination  of  the  rows  of  A.  The  general  solution  to  the  system  is 
x  =  w  +  z,  where  w  is  the  particular  solution  belonging  to  the  coimage,  while  z  E  ker  A  is 
an  arbitrary  element  of  the  kernel. 

Theorem  4.50.  A  compatible  linear  system  Ax  =  b  with  b  E  img  A  =  (coker  A)1-  has  a 
unique  solution  w  E  coimgA  satisfying  Aw  =  b.  The  general  solution  is  x  =  w  +  z,  where 
z  E  ker  A.  The  particular  solution  w  E  coimgA  is  distinguished  by  the  fact  that  it  has  the 
smallest  Euclidean  norm  of  all  possible  solutions:  ||  w||  <  ||x||  whenever  Ax  =  b. 


Proof :  We  have  already  established  all  but  the  last  statement.  Since  the  coimage  and 
kernel  are  orthogonal  subspaces,  the  norm  of  a  general  solution  x  =  w  +  z  is 


x 


w  +  z 


w 


2  +  2  w  •  z  + 


z 


> 


w 


2 

i 


with  equality  if  and  only  if  z  =  0. 


Q.E.D. 


In  practice,  to  determine  the  unique  minimum-norm  solution  to  a  compatible  linear 
system,  we  invoke  the  orthogonality  of  the  coimage  and  kernel  of  the  coefficient  matrix. 
Thus,  if  z1? . . . ,  zn_r  form  a  basis  for  ker  A,  then  the  minimum- norm  solution  x  =  w  E 
coimg  A  is  obtained  by  solving  the  enlarged  system 

Ax  =  b,  z^x  =  0,  ...  z^_rx  =  0.  (4.49) 

The  associated  (m  +  n  —  r)  x  n  coefficient  matrix  is  simply  obtained  by  appending  the 
(transposed)  kernel  vectors  to  the  original  matrix  A.  The  resulting  matrix  is  guaranteed 
to  have  maximum  rank  n,  and  so,  assuming  b  E  img  A,  the  enlarged  system  has  a  unique 
solution,  which  is  the  minimum- norm  solution  to  the  original  system  Ax  =  b. 

Example  4.51.  Consider  the  linear  system 


/ 1  -1  2  —2 \  / x \ 

0  1-2  1  y 

13-52  x 
\5  -1  9  -6/  \wJ 


( 


\ 


■i' 

4 
6  / 


(4.50) 


Applying  the  usual  Gaussian  Elimination  algorithm,  we  discover  that  the  coefficient  matrix 

has  rank  3,  and  its  kernel  is  spanned  by  the  single  vector  z:  =  ( 1,  —  1, 0, 1 )  .  The  system 
itself  is  compatible;  indeed,  the  right-hand  side  is  orthogonal  to  the  basis  cokernel  vector 

( 2,  24,  —7, 1 )  ,  and  so  satisfies  the  Fredholm  condition  (4.48).  The  general  solution  to  the 

T 

linear  system  is  x  =  ( £,  3  —  £,  1,  t )  ,  where  t  =  w  is  the  free  variable. 
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To  find  the  solution  of  minimum  Euclidean  norm,  we  can  apply  the  algorithm  described 
in  the  previous  paragraph. t  Thus,  we  supplement  the  original  system  by  the  constraint 


(1  -1  0  1) 


(x\ 

y 

z 

\w  / 


=  X 


y  +  w  =  0 


(4.51) 


that  the  solution  be  orthogonal  to  the  kernel  basis  vector.  Solving  the  combined  linear 

T 

system  (4.50-51)  leads  to  the  unique  solution  x  =  w  =  ( 1,  2, 1, 1 )  ,  obtained  by  setting 
the  free  variable  t  equal  to  1.  Let  us  check  that  its  norm  is  indeed  the  smallest  among  all 
solutions  to  the  original  system: 


w 


—  V7  <  ||  x  ||  =  ||  ( £,  3  —  £,  1,  t  )T  ||  =  \/3t2  —  6£  +  10  , 


where  the  quadratic  function  inside  the  square  root  achieves  its  minimum  value  of  \/7  at 
t  —  1.  It  is  further  distinguished  as  the  only  solution  that  can  be  expressed  as  a  linear 
combination  of  the  rows  of  the  coefficient  matrix: 


w 


T 


=  (1,  2,  1,  1) 


—  —4(1,  -1,  2,  — 2 )  —  17  ( 0,  1,  -2,  1)  +5(1,  3,  -5,  2), 
meaning  that  w  lies  in  the  coimage  of  the  coefficient  matrix. 


Exercises 


4.4.29.  For  each  of  the  following  matrices  A,  ( i )  find  a  basis  for  each  of  the  four  fundamental 
subspaces;  (ii)  verify  that  the  image  and  cokernel  are  orthogonal  complements;  (in)  verify 
that  the  coimage  and  kernel  are  orthogonal  complements: 


(a) 


( b ) 
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4.4.30.  For  each  of  the  following  matrices,  use  Gaussian  elimination  on  the  augmented  matrix 


(a) 


4.4.31.  Let  A  = 


( 


\ 


1 

2 

1 


( b ) 

-2 

4 

2 


1 

2 


3  \ 

(11  3  \ 

(  1  -2  -2\ 

6 

,  (c) 

-1  1  -2 

,  (d) 

0-1  3 

2  -5  -1 

9/ 

A  3  6) 

V  —2  2  10 ) 

2 

3 

0 


-1\ 

5 
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.  (a)  Find  a  basis  for  coimg  A  (b)  Use  Theorem  4.49 


to  find  a  basis  of  img  A  (c)  Write  each  column  of  A  as  a  linear  combination  of  the  basis 
vectors  you  found  in  part  (b). 


t  An  alternative  is  to  orthogonally  project  the  general  solution  onto  the  coimage.  The  result  is 
the  same. 
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4.4.32.  Write  down  the  compatibility  conditions  on  the  following  systems  of  linear  equations  by 
first  computing  a  basis  for  the  cokernel  of  the  coefficient  matrix,  (a)  2x  +  y  =  a, 

x  +  4 y  =  b,  —  3x  +  2 y  =  c;  (b)  x  +  2y  +  3z  =  a,  —  x  +  by  —  2z  =  6,  2x  —  3y  +  5z  =  c; 

(c)  +  2x2  +  3xs  =  61?  £2  +  2x3  =  62,  3X-L  +  5x2  +  7x3  =  63,  —  2x1  +  x2  +  4x3  =  64; 

(d)  x  — 3?/  +  22;  +  ie  =  a,  4x  — 2y  +  22:  +  3ic  =  6,  5x  —  5y  +  42:  +  4u’  =  c,  2x  +  4y  —  2z-\-w  =  d. 

4.4.33.  For  each  of  the  following  m  x  n  matrices,  decompose  the  first  standard  basis  vector 

e =  w  +  z  £  IRn,  where  w  £  coimg  A  and  z  £  ker  A.  Verify  your  answer  by  expressing  w  as 
a  linear  combination  of  the  rows  of  A. 


(a) 


1-2  1 
2-3  2 


>  (b) 


/ 


V 


1 

1 

■2 


1 

0 

1 


>  (c) 


/I  -1  0  3  \ 

2  13  3 

\1  230/ 


>  (d) 


-11  1-12 
-3  2-1-2  0 


4.4.34.  For  each  of  the  following  linear  systems,  (i)  verify  compatibility  using  the  Fredholm 
alternative,  (ii)  find  the  general  solution,  and  (Hi)  find  the  solution  of  minimum 
Euclidean  norm: 

2x  +  3  y  =  —1, 

2x  —  4y  =  —6,  6x  —  3y  +  9z  =  12, 


(a) 


-X  +  2^  =  3, 


x  -\-  3y  -\-  b z  =  3, 
(d)  —  x  +  4y  +  9z  =  11, 
2x  +  3y  +  42:  =  0, 


(e) 


(b)  3x  +  7y  =  1,  (c) 

—  3x  +  2y  =  8, 
x-L  —  3x2  +  7x3  =  —8, 
2x1  +  x2  =  5, 
Ax1  —  3x2  +  10x3  =  —5, 
-2x1  +  2x2  —  6x3  =  4. 


2x  —  y  +  3  z  =  4, 

x  —  y  -\-  2z  -\-  3w  =  5, 
(f)  3x  —  3y  +  52:  +  7w  =  13, 
—  2x  +  2y  +  z  +  4:W  =  0. 


rp 

4.4.35.  Show  that  if  A  =  A  is  a  symmetric  matrix,  then  Ax  =  b  has  a  solution  if  and  only  if 
b  is  orthogonal  to  ker  A. 

0  4.4.36.  Suppose  v1? . . . ,  vn  span  a  subspace  V  C  Mm.  Prove  that  w  is  orthogonal  to  V  if  and 
only  if  w  £  coker  A,  where  A  =  ( v-^  v2  . . .  vn  )  is  the  matrix  with  the  indicated  columns. 


4.4.37.  Let  A  = 


/ 

1 

-1 

0 

2  \ 

2 

-2 

0 

4 

-1 

1 

1 

-1 

\ 

0 

0 

2 

2  / 

.  (a)  Find  an  orthogonal  basis  for  coimg  A.  (b)  Find  an 


orthogonal  basis  for  ker  A.  (c)  If  you  combine  your  bases  from  parts  (a)  and  (b),  do  you 
get  an  orthogonal  basis  of  IR4?  Why  or  why  not? 

0  4.4.38.  Prove  that  if  v1? . . . ,  vr  are  a  basis  of  coimg  A,  then  their  images  Av1? . . . ,  Avr  are  a 
basis  for  img  A. 

4.4.39.  True  or  false:  The  standard  algorithm  for  finding  a  basis  for  ker  A  will  always  produce 
an  orthogonal  basis. 

0  4.4.40.  Is  Theorem  4.45  true  as  stated  for  complex  matrices?  If  not,  can  you  formulate  a 
similar  theorem  that  is  true?  What  is  the  Fredholm  alternative  for  complex  matrices? 


4.5  Orthogonal  Polynomials 

Orthogonal  and  orthonormal  bases  play,  if  anything,  an  even  more  essential  role  in  func¬ 
tion  spaces.  Unlike  the  Euclidean  space  most  of  the  obvious  bases  of  a  typical  (finite¬ 
dimensional)  function  space  are  not  orthogonal  with  respect  to  any  natural  inner  product. 
Thus,  the  computation  of  an  orthonormal  basis  of  functions  is  a  critical  step  towards  sim¬ 
plification  of  the  analysis.  The  Gram-Schmidt  algorithm,  in  any  of  the  above  formulations, 
can  be  successfully  applied  to  construct  suitably  orthogonal  functions.  The  most  impor- 
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tant  examples  are  the  classical  orthogonal  polynomials  that  arise  in  approximation  and 
interpolation  theory.  Other  orthogonal  systems  of  functions  play  starring  roles  in  Fourier 
analysis  and  its  generalizations,  including  wavelets,  in  quantum  mechanics,  in  the  solution 
of  partial  differential  equations  by  separation  of  variables,  and  a  host  of  further  applications 
in  mathematics,  physics,  engineering,  numerical  analysis,  etc.,  [43,  54,  62,  61,  77,  79,  88]. 

The  Legendre  Polynomials 

We  shall  construct  an  orthonormal  basis  for  the  vector  space  V ^  of  polynomials  of  degree 
<  n.  For  definiteness,  the  construction  will  be  based  on  the  L2  inner  product 

p,q)  =  [  p(t)q{t)dt  (4.52) 


on  the  interval  [—1, 1].  The  underlying  method  will  work  on  any  other  bounded  interval, 
as  well  as  for  weighted  inner  products,  but  (4.52)  is  of  particular  importance.  We  shall 
apply  the  Gram-Schmidt  orthogonalization  process  to  the  elementary,  but  non-orthogonal 
monomial  basis  1,  t,  t2,  ...  tn.  Because 

2 


tk,tl)  = 


l  k~\~l 


dt  — 


-l 


k  + l  even, 
k  +  l  odd, 


(4.53) 


k  +  l  +  V 

0, 

odd-degree  monomials  are  orthogonal  to  those  of  even  degree,  but  that  is  all.  We  will  use 
q0(t),  #-[_(£),...,  qn(t )  to  denote  the  resulting  orthogonal  polynomials.  We  begin  by  setting 

■  i 


Qo(t)  =  !» 


with 


% 


J  %  it)2  dt  =  2. 


According  to  formula  (4.17),  the  next  orthogonal  basis  polynomial  is 


<h  (t)  =  t 


COo) 

I  fell2 


Qo(t)  =  t, 


with 


<h 


2 

3 


In  general,  the  Gram-Schmidt  formula  (4.19)  says  we  should  define 


i  v-'  ( tk  :q 

Qkit)  =  1  -  H 


J 


3  =  0 


Q 


j 


Qj  (*) 


for 


k  =  1,2,... 


We  can  thus  recursively  compute  the  next  few  orthogonal  polynomials: 

1  IM2  =  ik 

II  fell2  =  lfs- 

II  ||  2  =  128 

II  ^4 II  11025’ 

||  ||  2  =  128 

9  u  '  21  u  H  ^6 II  43659  ’ 

and  so  on.  The  reader  can  verify  that  they  satisfy  the  orthogonality  conditions 


<123)  =t2  3’ 

fe(9  =  t3  -  h, 

A 


5 

qA(t)  =  C  -  f  t2  +  -k, 

%3)  =  t5  -  ft3  + 


(4.54) 


( Qi ,  Qj  >  =  J  Qi3)  Qj(t)  dt  =  0,  j. 


The  resulting  polynomials  q0,  q±,  q2, . . .  are  known  as  the  monic ^  Legendre  polynomials ,  in 
honor  of  the  eighteenth-century  French  mathematician  Adrien-Marie  Legendre,  who  first 


A  polynomial  is  called  monic  if  its  leading  coefficient  is  equal  to  1. 
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used  them  for  studying  Newtonian  gravitation.  Since  the  first  n  Legendre  polynomials, 
namely  g0, . . . ,  qn_1  span  the  subspace  of  polynomials  of  degree  <  n  —  1,  the  next 

one,  qn ,  can  be  characterized  as  the  unique  monic  polynomial  that  is  orthogonal  to  every 
polynomial  of  degree  <  n  —  1: 


tk  ,Qn  )  = 


k  —  0, . . . ,  n  —  1. 


(4.55) 


Since  the  monic  Legendre  polynomials  form  a  basis  for  the  space  of  polynomials,  we  can 
uniquely  rewrite  any  polynomial  of  degree  n  as  a  linear  combination: 


p(t)  =  co%(t)  +ci<h(t)  +  •••  +cnqn(t). 


(4.56) 


In  view  of  the  general  orthogonality  formula  (4.7),  the  coefficients  are  simply  given  by  inner 
products 

(PiQk)  1 


Ck  — 


Qk  II 


9*11 


p(t)  qk{t)dt, 


k  —  0, . . . ,  n. 


(4.57) 


-l 


For  example, 

£4  =  q4(t)  +  f  q2(t)  +  \  q^(f)  =  (t4  -  f  t2  +  ^  )  +  f  (t2  -  | 
where  the  coefficients  can  be  obtained  either  directly  or  via  (4.57): 


c4  = 


11025 

128 


J  t 4  q4(t)  dt  —  1, 


C3  ~ 


175 

it 


J  t4  %(t)  dt  —  o? 


and  so  on. 


The  classical  Legendre  polynomials ,  [59],  are  certain  scalar  multiples,  namely 


,  (2  k) ! 

^k  W  —  2  k  (^!)2 


fc  =  0,1,2, ..., 


(4.58) 


and  so  also  define  a  system  of  orthogonal  polynomials.  The  multiple  is  fixed  by  the  re¬ 
quirement  that 

Pk(  1)  -  1,  (4-59) 

which  is  not  so  important  here,  but  does  play  a  role  in  other  applications.  The  first  few 
classical  Legendre  polynomials  are 


p0(k  =  i, 

pi  (t)  =  t, 

P‘1  (t)  =  1  i‘ 


P, 


0 


=  2, 


3  j.2  _  1 

2  1  2  5 

P3(t)  =  \t3 -\t, 

35  f  4  _  15  4.2 
8  1  4 

63  r 5  _  35  +3 
8  1  4 


p4(t)  =  ft4-¥t2  + 1, 

p5(t)  =  ft5-ft3  +  ft, 


Pi 

Pi 

p3 

p4 

p- 


2 

3  ’ 
2 

5  ’ 

2 

7’ 

2 

9’ 

2_ 

11  ’ 


(4.60) 


and  are  graphed  in  Figure  4.8.  There  is,  in  fact,  an  explicit  formula  for  the  Legendre  poly¬ 
nomials,  due  to  the  early  nineteenth-century  mathematician,  banker,  and  social  reformer 
Olinde  Rodrigues. 


Theorem  4.52.  The  Rodrigues  formula  for  the  classical  Legendre  polynomials  is 


™  ^  ~  ')*• 


P. 


k 


2k+l  ’ 


A;  =  0,1, 2, 


(4.61) 
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Figure  4.8.  The  Legendre  Polynomials  P0 (£),... ,  P5 (t) . 


Thus,  for  example, 


1  d4 
16-4!  5a 


1  dl 


384  dt4 


(i2-l)4  =  ft4 


15  +2,3 
4  6  '  8  • 


Proof  of  Theorem  J^.52\  Let 


rjj 

Rj,k(t)  (4-62) 

which  is  evidently  a  polynomial  of  degree  2  k—j.  In  particular,  the  Rodrigues  formula  (4.61) 
claims  that  Pk(t)  is  a  multiple  of  Rk  k(t).  Note  that 

=  Rj+i,k(?)'  (4.63) 

Moreover, 

Rjjk(  1)  =  0  =  Rjjk(—  1)  whenever  j  <  fc,  (4.64) 

since,  by  the  product  rule,  differentiating  (t2  —  l)k  a  total  of  j  <  k  times  still  leaves  at 
least  one  factor  of  t2  —  1  in  each  summand,  which  therefore  vanishes  at  t  —  =bl.  In  order 
to  complete  the  proof  of  the  first  formula,  let  us  establish  the  following  result: 

Lemma  4.53.  If  j  <  /c,  then  the  polynomial  R-k(t)  is  orthogonal  to  all  polynomials  of 
degree  <  j  —  1. 

Proof :  In  other  words, 


( f  ,  R-  k  )  =  J  f  R-  k{t)  dt  =  0,  for  all  0  <  i  <  j  <  k.  (4.65) 
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Since  j  >  0,  we  use  (4.63)  to  write  R-k{t)  —  R'-_  1  k(t).  Integrating  by  parts, 


{t',Rj>k)=  t'R_1>k(t)dt 


-1 

=  it' Rj-iAt) 


If1. 

t=_1~  i  J  t'-lRj_ltk(t)dt  =  -i(t'-1,Rj_ltk), 


where  the  boundary  terms  vanish  owing  to  (4.64).  In  particular,  setting  i  =  0  proves 
1 ,  R- ■  k )  —  0  for  all  j  >  0.  We  then  repeat  the  process,  and,  eventually,  for  any  j  >  z, 


(tt,Rjtk)  =  -i(t'-l,Rj-i,k 

=  i(i-l)(ti~2,Rj_2ik)=  •••  =(-l)ii(i-l)  •••  3-2(1,  Rj-itk )  —  0) 

completing  the  proof.  Q.E.D. 

In  particular,  Rk  k(t)  is  a  polynomial  of  degree  k  that  is  orthogonal  to  every  polynomial 
of  degree  <  k  —  1.  By  our  earlier  remarks,  this  implies  that  it  must  be  a  constant  multiple, 


Rk,k(f)  =  Ck^>k^f)i 

of  the  kth  Legendre  polynomial.  To  determine  cfc,  we  need  only  compare  the  leading  terms: 

! 


dtk 


dtk 


(2  k)  ■.  k 
k\  + 


,  while  Pfc(0=2i2^|)2*2fc  + 


We  conclude  that  ck  =  2 kk\,  which  proves  the  first  formula  in  (4.61).  The  proof  of  the 
formula  for  ||  Pk  ||  can  be  found  in  Exercise  4.5.9.  Q.E.D. 


The  Legendre  polynomials  play  an  important  role  in  many  aspects  of  applied  math¬ 
ematics,  including  numerical  analysis,  least  squares  approximation  of  functions,  and  the 
solution  of  partial  differential  equations,  [61  . 


Exercises 

4.5.1.  Write  the  following  polynomials  as  linear  combinations  of  monic  Legendre  polynomials. 
Use  orthogonality  to  compute  the  coefficients:  (a)  £3,  (b)  t4  +  t2,  (c)  7£4  +  2£3  —  t. 

4.5.2.  (a)  Find  the  monic  Legendre  polynomial  of  degree  5  using  the  Gram-Schmidt  process. 
Check  your  answer  using  the  Rodrigues  formula,  (b)  Use  orthogonality  to  write  t  as  a 
linear  combination  of  Legendre  polynomials,  (c)  Repeat  the  exercise  for  degree  6. 

0  4.5.3.  (a)  Explain  why  qn  is  the  unique  monic  polynomial  that  satisfies  (4.55).  (b)  Use  this 
characterization  to  directly  construct  q$(t). 

4.5.4.  Prove  that  the  even  (odd)  degree  Legendre  polynomials  are  even  (odd)  functions  of  t. 

4.5.5.  Prove  that  if  p(t)  =  p(  —  t)  is  an  even  polynomial,  then  all  the  odd-order  coefficients 
c2  j+i  =  0  in  its  Legendre  expansion  (4.56)  vanish. 

4.5.6.  Write  out  an  explicit  Rodrigues-type  formula  for  the  monic  Legendre  polynomial  qk{t) 
and  its  norm. 

4.5.7.  Write  out  an  explicit  Rodrigues-type  formula  for  an  orthonormal  basis  Qo(t), . . .  ,Qn(t ) 
for  the  space  of  polynomials  of  degree  <  n  under  the  inner  product  (4.52). 

0  4.5.8.  Use  the  Rodrigues  formula  to  prove  (4.59).  Hint :  Write  (£2  —  l)k  =  (t  —  l)k  (t  +  l)k . 
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O  4.5.9.  A  proof  of  the  formula  in  (4.61)  for  the  norms  of  the  Legendre  polynomials  is  based 

on  the  following  steps,  (a)  First,  prove  that  ||  Rk  k  ||2  =  (— l)k  (2  k)\  J  (t2  —  l)k  dt  by  a 

r  1  (^2  k-\-\  j\2 

repeated  integration  by  parts,  (b)  Second,  prove  that  /  (t2  —  1)  dt  =  (—1)  — — - — — 

J  —  l  (2k  +  1) ! 

by  using  the  change  of  variables  t  =  cos  0  in  the  integral.  The  resulting  trigonometric 
integral  can  be  done  by  another  repeated  integration  by  parts,  (c)  Finally,  use  the 
Rodrigues  formula  to  complete  the  proof. 


T  4.5.10.  (a)  Find  the  roots,  Pn(t)  =  0,  of  the  Legendre  polynomials  P2?^3  and  -P4.  (b)  Prove 
that  for  0  <  j  <  k,  the  polynomial  Rj  k(t)  defined  in  (4.62)  has  roots  of  order  k  —  j  at 
t  =  d=l,  and  j  additional  simple  roots  lying  between  —1  and  1.  Hint:  Use  induction  on  j 
and  Rolle’s  Theorem  from  calculus,  [2,78].  (c)  Conclude  that  all  k  roots  of  the  Legendre 

polynomial  Pk(t )  are  real  and  simple,  and  that  they  lie  in  the  interval  —  1  <  t  <  1. 


Other  Systems  of  Orthogonal  Polynomials 


The  standard  Legendre  polynomials  form  an  orthogonal  system  with  respect  to  the  L2 
inner  product  on  the  interval  [  —  1, 1].  Dealing  with  any  other  interval,  or,  more  generally, 
a  weighted  inner  product,  leads  to  a  different,  suitably  adapted  collection  of  orthogonal 
polynomials.  In  all  cases,  applying  the  Gram-Schmidt  process  to  the  standard  monomials 
1 ,  t,  t2,  t3, . . .  will  produce  the  desired  orthogonal  system. 

Example  4.54.  In  this  example,  we  construct  orthogonal  polynomials  for  the  weighted 
inner  product ^ 

/»oo 

( f,g)=  f(t)g(t)e~tdt  (4.66) 

Jo 

on  the  interval  [0,  00).  A  straightforward  integration  by  parts  proves  that 


and  hence  ( tl ,  P  )  =  (i  +  j) !, 


(4.67) 


We  apply  the  Gram-Schmidt  process  to  construct  a  system  of  orthogonal  polynomials  for 
this  inner  product.  The  first  few  are 


%  (0  =  1, 

<h  (0  =  t 

q2(t)  =  t 2 
q3(t)  =  t3 


% 


2 

npr  0o(*) 


-1, 


t  ,Qi 


% 

2  1  18t  —  6, 


9 12  + 


<h 


<h(t)  =  t‘ 


4 1  +  2, 


Q2 


=  4 


% 


=  36 


(4.68) 


The  resulting  orthogonal  polynomials  are  known  as  the  (monic)  Laguerre  polynomials , 
named  after  the  nineteenth-century  French  mathematician  Edmond  Laguerre,  [59]. 


^  The  functions  /,  g  must  not  grow  too  rapidly  as  t  — >  00  in  order  that  the  inner  product  be 
defined.  For  example,  polynomial  growth,  meaning  |  f(t)  |,  |  g(t)  \  <  CtN  for  t  0  and  some 

C  >  0,  0  <  N  <  (X),  suffices. 
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In  some  cases,  a  change  of  variables  may  be  used  to  relate  systems  of  orthogonal  poly¬ 
nomials  and  thereby  circumvent  the  Gram-Schmidt  computation.  Suppose,  for  instance, 
that  our  goal  is  to  construct  an  orthogonal  system  of  polynomials  for  the  L2  inner  product 

■6 

dt 


i(f,g))=[  f(t)g(t) 

J  a 


on  the  interval  [a,  b].  The  key  remark  is  that  we  can  map  the  interval  [  —  1, 1]  to  [a,  b]  by 
a  simple  change  of  variables  of  the  form  s  —  a  +  /3t.  Specifically, 

2t  —  b  —  a 

will  change  a  <  t  <  b  to  —  1  <  s  <  1.  (4.69) 


s  — 


b  —  a 


It  therefore  changes  functions  F(s),£?(s),  defined  for  —  1  <  s  <  1,  into  functions 


f(t)  =  F 


2 1  —  b  —  a 
b  —  a 


git)  =  G 


2 1  —  b  —  a 
b  —  a 


(4.70) 


defined  for  a  <  t  <  b.  Moreover,  when  integrating,  we  have  ds  = 
products  are  related  by 

k  /  /  z  /,  —  o  —  (1,  \ 

G 

a  J  a  ^  ^  CL  I 


if ,g)  =  [  f{t)g(t)dt  =  f  F 

J  a  J  a 


b  —  a 

2 1  —  b  —  a 
b  —  a 


dt ,  and  so  the  inner 


dt 


-l 


F(s)  G(s)  — y-  ds  —  {F  ,G), 


(4.71) 


where  the  final  L2  inner  product  is  over  the  interval  [—1,1].  In  particular,  the  change  of 
variables  maintains  orthogonality,  while  rescaling  the  norms;  explicitly, 


/^)  =  0  if  and  only  if  (F  :G)  —  0,  while  ||/||  = 


b  —  a 


F 


(4.72) 


Moreover,  if  F(s)  is  a  polynomial  of  degree  n  in  s,  then  f(t)  is  a  polynomial  of  degree  n  in 
t  and  conversely.  Let  us  apply  these  observations  to  the  Legendre  polynomials: 


Proposition  4.55.  The  transformed  Legendre  polynomials 


2 1  —  b  —  a 


a 


P 


k 


b  —  a 
2k  +  l  ’ 


k  =  0,1,2,...  ,  (4.73) 


form  an  orthogonal  system  of  polynomials  with  respect  to  the  L2  inner  product  on  the 
interval  [a,  b]. 


Example  4.56.  Consider  the  L2  inner  product  ((f,g))  =  fy  f(t)  g(t)  dt.  The  map 

s  —  2t  —  1  will  change  0<£<lto—  1  <  s  <  1.  According  to  Proposition  4.55,  this 
change  of  variables  will  convert  the  Legendre  polynomials  Pk(s)  into  an  orthogonal  system 
of  polynomials  on  [0,1],  namely 


Pk{t)  —  Pk(2t  —  1),  with  corresponding  L2  norms 


1 

2kTi  ' 


The  first  few  are 


p0(t)  - 1, 

P1(t)  =  2t-1, 

P2(t)  =  6t2  —  6t  +  1, 


P3(t)  =  20 13  -  30 12  +  12 1  -  1, 

P4(t)  =  70 14  -  140 13  +  90 12  -  20 1  +  1,  (4.74) 

P5(t)  =  252 15  -  630 14  +  560 13  -  210t2  +  30£  -  1. 
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Alternatively,  one  can  derive  these  formulas  through  a  direct  application  of  the  Gram- 
Schmidt  process. 


Exercises 


4.5.11.  Construct  polynomials  P0,  P1?  P2:  and  P3  of  degree  0, 1,2,  and  3,  respectively,  that  are 

f2 

orthogonal  with  respect  to  the  inner  products  (a)  (f,g)=  J  f(t)g(t)dt ,  (b)  (f  ,g)  = 


t  f(t)g{t)tdt ,  (c)  </,5>  =  f  f(t)g(t)t2  dt,  (d)  (f,g}=  f  f{t)g{t) 

■J  (J  J  —  1  J  —  OO 


e  dt.. 


4.5.12.  Find  the  first  four  orthogonal  polynomials  on  the  interval  [0, 1]  for  the  weighted  L2 
inner  product  with  weight  ic(t)  =  £2. 

4.5.13.  Write  down  an  orthogonal  basis  for  vector  space  V ^  of  quintic  polynomials  under  the 

f2 

inner  product  (f,g)=  J  f(t)g(t)dt. 

4.5.14.  Use  the  Gram-Schmidt  process  based  on  the  L2  inner  product  on  [0, 1]  to  construct  a 
system  of  orthogonal  polynomials  of  degree  <  4.  Verify  that  your  polynomials  are  multiples 
of  the  modified  Legendre  polynomials  found  in  Example  4.56. 

4.5.15.  Find  the  first  four  orthogonal  polynomials  under  the  Sobolev  H1  inner  product 

</,5>  =  C1  [f(t)9(t)  +  f(t)g'(t)]dt;  cf.  Exercise  3.1.27. 

0  4.5.16.  Prove  the  formula  for  ||  Pk  ||  in  (4.73)  . 

4.5.17.  Find  the  monic  Laguerre  polynomials  of  degrees  4  and  5  and  their  norms. 

0  4.5.18.  Prove  the  integration  formula  (4.67). 

0  4.5.19.  (a)  The  physicists’  Hermite  polynomials  are  orthogonal  with  respect  to  the  inner 

/oo  _pi 

f(t )  g(t)  e  dt.  Find  the  first  five  monic  Hermite  polynomials. 

-00 


Hint : 


‘OO 


■00 


e  1  dt  =  \Ztt  .  (b)  The  probabilists  prefer  to  use  the  inner  product 


r  00  _ pz  1 2 

(/  >  <7 )  =  /  fit)  g(t)  e  1  dt.  Find  the  first  five  of  their  monic  Hermite  polynomials. 

(c)  Can  you  find  a  change  of  variables  that  transforms  the  physicists’  versions  to  the 
probabilists’  versions? 

T  4.5.20.  The  Chebyshev  polynomials:  (a)  Prove  that  Tn(t)  =  cos(n  arccos  £),  n  =  0,1,2,...  , 
form  a  system  of  orthogonal  polynomials  under  the  weighted  inner  product 

1  fit)  g(t)  dt 


(4.75) 


\/l  ~  t2 

(b)  What  is  \\Tn  ||?  (c)  Write  out  the  formulas  for  T0(t), . . .  ,T6(£)  and  plot  their  graphs. 

4.5.21.  Does  the  Gram-Schmidt  process  for  the  inner  product  (4.75)  lead  to  the  Chebyshev 
polynomials  Tn(t)  defined  in  the  preceding  exercise?  Explain  why  or  why  not. 

4.5.22.  Find  two  functions  that  form  an  orthogonal  basis  for  the  space  of  the  solutions  to  the 
differential  equation  y"  —  3yr  +  2y  =  0  under  the  L2  inner  product  on  [0, 1]. 

4.5.23.  Find  an  orthogonal  basis  for  the  space  of  solutions  to  the  differential  equation 
y"  —  y"  +  y'  —  y  =  0  for  the  L2  inner  product  on  [  —  7r,7r]. 
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4  Orthogonality 


O  4.5.24.  In  this  exercise,  we  investigate  the  effect  of  more  general  changes  of  variables  on 

orthogonal  polynomials,  (a)  Prove  that  t  =  2  s2  —  1  defines  a  one-to-one  map  from  the 
interval  0  <  s  <  1  to  the  interval  —  1  <  t  <  1.  (b)  Let  p^(t)  denote  the  monic  Legendre 
polynomials,  which  are  orthogonal  on  —  1  <  t  <  1.  Show  that  <2%(s)  =  pk(2s2  —  1) 
defines  a  polynomial.  Write  out  the  cases  k  =  0, 1,  2,3  explicitly,  (c)  Are  the  polynomials 

r\ 

q^(s)  orthogonal  under  the  L  inner  product  on  [0, 1]?  If  not,  do  they  retain  any  sort  of 
orthogonality  property?  Hint:  What  happens  to  the  L  inner  product  on  [—1, 1]  under  the 
change  of  variables? 

4.5.25.  (a)  Show  that  the  change  of  variables  s  =  e~t  maps  the  Laguerre  inner  product  (4.66) 
to  the  standard  L2  inner  product  on  [0, 1].  However,  explain  why  this  does  not  allow  you 
to  change  Legendre  polynomials  into  Laguerre  polynomials,  (b)  Describe  the  functions 
resulting  from  applying  the  change  of  variables  to  the  modified  Legendre  polynomials  (4.74) 
and  their  orthogonality  properties,  (c)  Describe  the  functions  that  result  from  applying 
the  inverse  change  of  variables  to  the  Laguerre  polynomials  (4.68)  and  their  orthogonality 
properties. 

4.5.26.  Explain  how  to  adapt  the  numerically  stable  Gram-Schmidt  method  in  (4.28)  to 
construct  a  system  of  orthogonal  polynomials.  Test  your  algorithm  on  one  of  the  preceding 
exercises. 


® 

Check  for 
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Chapter  5 

Minimization  and  Least  Squares 


Because  Nature  seems  to  strive  for  efficiency,  many  systems  arising  in  physical  applications 
are  founded  on  a  minimization  principle.  In  a  mechanical  system,  the  stable  equilibrium 
configurations  minimize  the  potential  energy.  In  an  electrical  circuit,  the  current  adjusts 
itself  to  minimize  the  power.  In  optics  and  relativity,  light  rays  follow  the  paths  of  minimal 
distance  —  the  geodesics  on  the  curved  space-time  manifold.  Solutions  to  most  of  the 
boundary  value  problems  arising  in  applications  to  continuum  mechanics  are  also  char¬ 
acterized  by  a  minimization  principle,  which  is  then  employed  to  design  finite  element 
numerical  approximations  to  their  solutions,  [61,81].  Optimization  —  finding  minima 
or  maxima  —  is  ubiquitous  throughout  mathematical  modeling,  physics,  engineering,  eco¬ 
nomics,  and  data  science,  including  the  calculus  of  variations,  differential  geometry,  control 
theory,  design  and  manufacturing,  linear  programming,  machine  learning,  and  beyond. 


This  chapter  introduces  and  solves  the  most  basic  mathematical  minimization  problem: 
a  quadratic  polynomial  function  depending  on  several  variables.  (Minimization  of  more 
complicated  functions  is  of  comparable  significance,  but  relies  on  the  nonlinear  methods 
of  multivariable  calculus,  and  thus  lies  outside  our  scope.)  Assuming  that  the  quadratic 
coefficient  matrix  is  positive  definite,  the  minimizer  can  be  found  by  solving  an  associated 
linear  algebraic  system.  Orthogonality  also  plays  an  important  role  in  minimization  prob¬ 
lems.  Indeed,  the  orthogonal  projection  of  a  point  onto  a  subspace  turns  out  to  be  the 
closest  point  or  least  squares  minimizer.  Moreover,  when  written  in  terms  of  an  orthogonal 
or  orthonormal  basis  for  the  subspace,  the  orthogonal  projection  has  an  elegant  explicit 
formula  that  also  offers  numerical  advantages  over  the  direct  approach  to  least  squares 
minimization. 


The  most  common  way  of  fitting  a  function  to  prescribed  data  points  is  to  minimize  the 
least  squares  error,  which  serves  to  quantify  the  overall  deviation  between  the  data  and  the 
sampled  function  values.  Our  presentation  includes  an  introduction  to  the  interpolation  of 
data  points  by  functions,  with  a  particular  emphasis  on  polynomials  and  splines.  The  final 
Section  5.6  is  devoted  to  the  basics  of  discrete  Fourier  analysis  —  the  interpolation  of  data 
by  trigonometric  functions  —  culminating  in  the  remarkable  Fast  Fourier  Transform,  a  key 
algorithm  in  modern  signal  processing  and  numerical  analysis.  Additional  applications  of 
these  tools  in  equilibrium  mechanics  and  electrical  circuits  will  form  the  focus  of  Chapter  6. 


5.1  Minimization  Problems 

Let  us  begin  by  introducing  three  important  minimization  problems  —  the  first  arising  in 
physics,  the  second  in  analysis,  and  the  third  in  geometry. 
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Figure  5.1.  Minimizing  a  Quadratic  Function. 


Equilibrium  Mechanics 

A  fundamental  principle  of  mechanics  is  that  systems  in  equilibrium  minimize  potential 
energy.  For  example,  a  ball  in  a  bowl  will  roll  downhill  unless  it  is  sitting  at  the  bottom, 
where  its  potential  energy  due  to  gravity  is  at  a  (local)  minimum.  In  the  simplest  class  of 
examples,  the  energy  is  a  quadratic  function,  e.g., 

f(x,  y)  =  3x2  -  2xy  +  Ay2  +  x  -  2y  +  1,  (5.1) 

and  one  seeks  the  point  x  —  x*,  y  —  y* ,  (if  one  exists)  at  which  /(x*,t/*)  achieves  its 
overall  minimal  value. 

Similarly,  a  pendulum  will  swing  back  and  forth  unless  it  rests  at  the  bottom  of  its  arc, 
where  potential  energy  is  minimized.  Actually,  the  pendulum  has  a  second  equilibrium 
position  at  the  top  of  the  arc,  as  in  Figure  5.2,  but  this  is  rarely  observed,  since  it  is  an 
unstable  equilibrium,  meaning  that  any  tiny  movement  will  knock  it  off  balance.  There¬ 
fore,  a  better  way  of  stating  the  principle  is  that  stable  equilibria  are  where  the  mechanical 
system  (locally)  minimizes  potential  energy.  For  a  ball  rolling  on  a  curved  surface,  the 
local  minima  —  the  bottoms  of  valleys  —  are  the  stable  equilibria,  while  the  local  maxima 

the  tops  of  hills  —  are  unstable.  Minimization  principles  serve  to  characterize  the  equi¬ 
librium  configurations  of  a  wide  range  of  physical  systems,  including  masses  and  springs, 
structures,  electrical  circuits,  and  even  continuum  models  of  solid  mechanics  and  elasticity, 
fluid  mechanics,  relativity,  electromagnetism,  thermodynamics,  and  so  on. 


Solution  of  Equations 


Suppose  we  wish  to  solve  a  system  of  equations 


/i(x)  =  0, 


/2(x)  =  0, 


f m  (X)  =  0. 


(5.2) 


where  x  =  (x1? . . . ,  xn)  E  Mn.  This  system  can  be  converted  into  a  minimization  problem 
in  the  following  seemingly  silly  manner.  Define 


P(x)  =  [  /l  (x)  ]  2  +  •••  +  [fm(X)Y  =  llf(X) 


1  2 


(5.3) 


T 

where  f(x)  =  (  f1(x),  . . .  ,  fm(x) )  and  ||  •  ||  denotes  the  Euclidean  norm  on  Mm.  Clearly, 
p(x)  >  0  for  all  x.  Moreover,  p(x*)  =  0  if  and  only  if  each  summand  is  zero,  and  hence 
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Stable  Unstable 

Figure  5.2.  Equilibria  of  a  Pendulum. 


x  =  x*  is  a  solution  to  (5.2).  Therefore,  the  minimum  value  of  p(x)  is  zero,  and  the 
minimum  is  achieved  if  and  only  if  we  are  at  a  solution  to  the  original  system  of  equations. 
For  us,  the  most  important  case  is  that  of  a  linear  system 

Ax  =  b  (5.4) 


consisting  of  m  equations  in  n  unknowns.  In  this  case,  the  solutions  may  be  obtained  by 
minimizing  the  function 


p(x)  —  ||  Ax  —  b 


(5.5) 


where  ||  •  ||  denotes  the  Euclidean  norm  on  Mm.  Clearly  p(x)  has  a  minimum  value  of  0, 
which  is  achieved  if  and  only  if  x  is  a  solution  to  the  linear  system  (5.4).  Of  course,  it 
is  not  clear  that  we  have  gained  much,  since  we  already  know  how  to  solve  Ax  =  b  by 
Gaussian  Elimination.  However,  this  artifice  turns  out  to  have  profound  consequences. 

Suppose  that  the  linear  system  (5.4)  does  not  have  a  solution,  i.e.,  b  does  not  lie  in 
the  image  of  the  matrix  A.  This  situation  is  very  typical  when  there  are  more  equations 
than  unknowns.  Such  problems  arise  in  data  fitting,  when  the  measured  data  points  are  all 
supposed  to  he  on  a  straight  line,  say,  but  rarely  do  so  exactly,  due  to  experimental  error. 
Although  we  know  there  is  no  exact  solution  to  the  system,  we  might  still  try  to  find  an 
approximate  solution  —  a  vector  x*  that  comes  as  close  to  solving  the  system  as  possible. 
One  way  to  measure  closeness  is  by  looking  at  the  magnitude  of  the  error  as  measured  by 
the  residual  vector  r  =  b  —  Ax,  i.e.,  the  difference  between  the  right-  and  left-hand  sides  of 


the  system.  The  smaller  its  norm 


Ax  —  b  ,  the  better  the  attempted  solution.  For 


the  Euclidean  norm,  the  vector  x*  that  minimizes  the  squared  residual  norm  function  (5.5) 
is  known  as  the  least  squares  solution  to  the  linear  system,  because 


2  _  i  , 


=  r  f  + 


Art  IS 


the  sum  of  the  squares  of  the  individual  error  components.  As  before,  if  the  linear  system 
(5.4)  happens  to  have  an  actual  solution,  with  Ax*  =  b,  then  x*  qualifies  as  the  least 
squares  solution  too,  since  in  this  case,  ||  Ax*  —  b  ||  =0  achieves  its  absolute  minimum.  So 
least  squares  solutions  include  traditional  solutions  as  special  cases. 

Unlike  an  exact  solution,  the  least  squares  minimizer  depends  on  the  choice  of  inner 
product  governing  the  norm;  thus  a  suitable  weighted  norm  can  be  introduced  to  emphasize 
or  de-emphasize  the  various  errors.  While  not  the  only  possible  approach,  least  squares 
is  certainly  the  easiest  to  analyze  and  solve,  and,  hence,  is  often  the  method  of  choice  for 
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Figure  5.3.  The  Closest  Point. 


fitting  functions  to  experimental  data  and  performing  statistical  analysis.  It  is  essential 
that  the  norm  arise  from  an  inner  product;  minimizing  the  error  based  on  other  kinds  of 
norms  is  a  much  more  difficult,  nonlinear  problem,  although  one  that  has  recently  become 
of  immense  practical  interest  in  the  newly  emergent  held  of  compressed  sensing,  [28  . 

The  Closest  Point 


The  following  minimization  problem  arises  in  elementary  geometry,  although  its  practical 
implications  cut  a  much  wider  swath.  Given  a  point  b  E  and  a  subset  V  C  Mm,  find 
the  point  v*  E  V  that  is  closest  to  b.  In  other  words,  we  seek  to  minimize  the  Euclidean 
distance  d(v,  b)  =  ||  v  —  b  ||  over  all  possible  v  E  V. 

The  simplest  situation  occurs  when  V  is  a  subspace  of  Mm.  In  this  case,  the  closest 
point  problem  can,  in  fact,  be  reformulated  as  a  least  squares  minimization  problem.  Let 
v1? . . . ,  vn  be  a  basis  for  V.  The  general  element  v  E  V  is  a  linear  combination  of  the 
basis  vectors.  Applying  our  handy  matrix  multiplication  formula  (2.13),  we  can  write  the 
subspace  elements  in  the  form 


V  =  X1V1+  *"  +  ^nVn=Ax, 


where  A  —  ( v2  . . .  vn )  is  the  m  x  n  matrix  formed  by  the  (column)  basis  vectors 

T 

and  x  =  (  X  ^  •j  X  ^  ^  X  ^  ^  the  coordinates  of  v  relative  to  the  chosen  basis.  In  this 

manner,  we  can  identify  V  with  the  image  of  A,  i.e.,  the  subspace  spanned  by  its  columns. 
Consequently,  the  closest  point  in  V  to  b  is  found  by  minimizing  ||  v  —  b  1 1 2  =  ||  Ax  —  b  1 1 2 
over  all  possible  x  E  Mn.  But  this  is  exactly  the  same  as  the  least  squares  function  (5.5)! 
Thus,  if  x*  is  the  least  squares  solution  to  the  system  Ax  =  b,  then  v*  =  Ax*  is  the  closest 
point  to  b  belonging  to  V  =  img  A.  In  this  way,  we  have  established  a  profound  and  fertile 
connection  between  least  squares  solutions  to  linear  systems  and  the  geometrical  problem 
of  minimizing  distances  to  subspaces.  And,  as  we  shall  see,  the  closest  point  v  E  V  turns 
out  to  be  the  orthogonal  projection  of  b  onto  the  subspace. 

All  three  of  the  preceding  minimization  problems  are  solved  by  the  same  underlying 
mathematical  construction,  which  will  now  be  described  in  detail. 


Remark.  In  this  book,  we  will  concentrate  on  minimization  problems.  Maximizing  a 
function  p(x)  is  the  same  as  minimizing  its  negative  —  p(x),  and  so  can  be  easily  handled 
by  the  same  techniques. 
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Exercises 

Note :  Unless  otherwise  indicated,  “distance”  refers  to  the  Euclidean  norm. 

5.1.1.  Find  the  least  squares  solution  to  the  pair  of  equations  3x  =  1,  2x  =  —  1. 

5.1.2.  Find  the  minimizer  of  the  function  /(x,  y )  =  (3x  —  2y  +  l)2  +  (2x  +  y  +  2)2. 

5.1.3.  Find  the  closest  point  or  points  to  b  =  (—1,2)  that  lie  on  (a)  the  x-axis,  (b)  the 
y-axis,  (c)  the  line  y  =  x,  (d)  the  line  x  +  y  =  0,  (e)  the  line  2x  +  y  =  0. 

5.1.4.  Solve  Exercise  5.1.3  when  distance  is  measured  in  ( i )  the  oo  norm,  ( ii )  the  1  norm. 

5.1.5.  Given  b  £  R2,  is  the  closest  point  on  a  line  L  unique  when  distance  is  measured  in 

(a)  the  Euclidean  norm?  (b)  the  1  norm?  (c)  the  oo  norm? 

C  5.1.6.  Let  L  C  R  be  a  line  through  the  origin,  and  let  b  £  R  be  any  point. 

(a)  Find  a  geometrical  construction  of  the  closest  point  v  £  L  to  b  when  distance  is 
measured  in  the  standard  Euclidean  norm. 

(b)  Use  your  construction  to  prove  that  there  is  one  and  only  one  closest  point. 


\! a  2  b 

2  —  (a  •  b)2 

a  x  b 

a 

a 

(c)  Show  that  if  0  7^  a  £  L,  then  the  distance  equals 
using  the  two-dimensional  cross  product  (3.22). 

5.1.7.  Suppose  a  and  b  are  unit  vectors  in  R  .  Show  that  the  distance  from  a  to  the  line 
through  b  is  the  same  as  the  distance  from  b  to  the  line  through  a.  Use  a  picture  to 
explain  why  this  holds.  How  is  the  distance  related  to  the  angle  between  the  two  vectors? 

5.1.8.  (a)  Prove  that  the  distance  from  the  point  (x0,y0  )  to  the  line  ax  +  by  =  0  is 
I  ax0  +  by0 


V  cl2  +  b2 


.  (b)  What  is  the  minimum  distance  to  the  line  ax  +  6y  +  c  =  0? 


T 

C  5.1.9.  (a)  Generalize  Exercise  5.1.8  to  find  the  distance  between  a  point  (x0,y0,  zQ  )  and  the 
plane  ax  +  by  +  cz  +  d  =  0  in  R  .  (b)  Use  your  formula  to  compute  the  distance  between 

( 1, 1, 1  )T  and  the  plane  3x  —  2 y  +  z  =  1. 

5.1.10.  (a)  Explain  in  detail  why  the  minimizer  of  ||  v  —  b  ||  coincides  with  the  minimizer  of 
v  —  b  ||2.  (b)  Find  all  scalar  functions  F(x)  for  which  the  minimizer  of  F 


is 


the  same  as  the  minimizer  of 


5.1.11.  (a)  Explain  why  the  problem  of  maximizing  the  distance  from  a  point  to  a  subspace 
does  not  have  a  solution,  (b)  Can  you  formulate  a  situation  in  which  maximizing  distance 
to  a  point  leads  to  a  problem  with  a  solution? 


5.2  Minimization  of  Quadratic  Functions 

The  simplest  algebraic  equations  are  linear  systems.  As  such,  one  must  thoroughly  un¬ 
derstand  them  before  venturing  into  the  far  more  complicated  nonlinear  realm.  For  mini¬ 
mization  problems,  the  starting  point  is  the  quadratic  function.  (Linear  functions  do  not 
have  minima  —  think  of  the  function  f(x)  —  ax  +  /?,  whose  graph  is  a  straight  line^.)  In 
this  section,  we  shall  see  how  the  problem  of  minimizing  a  general  quadratic  function  of  n 


^  Technically,  this  function  is  linear  only  when  (3  =  0;  otherwise  it  is  known  as  an  “affine 
function”.  See  Chapter  7  for  details. 
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Figure  5.4.  Parabolas. 


variables  can  be  solved  by  linear  algebra  techniques. 

Let  us  begin  by  reviewing  the  very  simplest  example  —  minimizing  a  scalar  quadratic 
polynomial 

p(x)  —  ax2  +  2bx  +  c  (5.6) 

over  all  possible  values  of  x  E  R.  If  a  >  0,  then  the  graph  of  p  is  a  parabola  opening 
upwards,  and  so  there  exists  a  unique  minimum  value.  If  a  <  0,  the  parabola  points 
downwards,  and  there  is  no  minimum  (although  there  is  a  maximum).  If  a  =  0,  the  graph 
is  a  straight  line,  and  there  is  neither  minimum  nor  maximum  over  all  x  E  R  —  except  in 
the  trivial  case  6  =  0  also,  and  the  function  p{x)  —  c  is  constant,  with  every  x  qualifying  as  a 
minimum  (and  a  maximum).  The  three  nontrivial  possibilities  are  illustrated  in  Figure  5.4. 

In  the  case  a  >  0,  the  minimum  can  be  found  by  calculus.  The  critical  points  of  a 
function,  which  are  candidates  for  minima  (and  maxima),  are  found  by  setting  its  derivative 
to  zero.  In  this  case,  differentiating,  and  solving 

p'(x )  =  2  ax  +  26  =  0, 


we  conclude  that  the  only  possible  minimum  value  occurs  at 


x  = - , 

a 


where 


p(x*)  —  c - 

a 


(5.7) 


Of  course,  one  must  check  that  this  critical  point  is  indeed  a  minimum,  and  not  a  maximum 
or  inflection  point.  The  second  derivative  test  will  show  that  p"{x*)  —  2a  >  0,  and  so  x * 
is  at  least  a  local  minimum. 

A  more  instructive  approach  to  this  problem  —  and  one  that  requires  only  elementary 
algebra  —  is  to  “complete  the  square”.  As  in  (3.66),  we  rewrite 

2  ac-b2 

+  - .  (5.8) 


p(x)  =  a  (  x  4 — 


a 


If  a  >  0,  then  the  first  term  is  always  >  0,  and,  moreover,  attains  its  minimum  value  0 
only  at  x *  =  —  6/a.  The  second  term  is  constant,  and  so  is  unaffected  by  the  value  of  x. 
Thus,  the  global  minimum  of  p{x)  is  at  x *  =  —6/a.  Moreover,  its  minimal  value  equals 
the  constant  term,  p(x*)  =  c  —  62/a,  thereby  reconfirming  and  strengthening  the  calculus 
result  in  (5.7). 

Now  that  we  have  the  one-variable  case  firmly  in  hand,  let  us  turn  our  attention  to  the 
more  substantial  problem  of  minimizing  quadratic  functions  of  several  variables.  Thus,  we 
seek  to  minimize  a  (real)  quadratic  polynomial 


n 

p(x.)  =p(x1,...,xn)  =  kijxixJ 

i,j  = 1 


2  N  fixi  +  c - 

i—  1 


(5.9) 
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T 

depending  on  n  variables  x  =  ( aq,  x2: . . . ,  xn  )  G  The  coefficients  ktJ ,  fi  and  c  are 
all  assumed  to  be  real.  Moreover,  we  can  assume,  without  loss  of  generality,  that  the 
coefficients  of  the  quadratic  terms  are  symmetric:  k-  =  k-{.  (See  Exercise  3.4.15  for  a 
justification.)  Note  that  p(x)  is  more  general  than  a  quadratic  form  (3.52)  in  that  it  also 
contains  linear  and  constant  terms.  We  seek  a  global  minimum,  and  so  the  variables  x 
are  allowed  to  vary  over  all  of  (Minimizing  a  quadratic  function  over  a  proper  subset 
x  G  S  C  Mn  is  a  more  challenging  problem,  and  will  not  be  discussed  here.) 

Let  us  begin  by  rewriting  the  quadratic  function  (5.9)  in  a  more  compact  matrix  nota¬ 
tion: 

p(x)  =xTKx-2xTf  +  c,  X  e  r,  (5.10) 

in  which  K  =  (&■  ■)  is  a  symmetric  n  x  n  matrix,  f  £  K"  is  a  constant  vector,  and  c  is  a 
constant  scalar. 

Example  5.1.  Consider  the  quadratic  function 

p(aq,  x2)  —  ^x\  —  2aq  x2  +  Zx\  +  3aq  —  2x2  +  1 
depending  on  two  real  variables  aq,  x2.  It  can  be  written  in  the  matrix  form  (5.10)  as 

p(xvx2)  =  (x1  x2)  3)  (S,)  ~2(Xl  xi)  (  ij+1,  (5-n) 

whereby 

K={~i  x=(d-  -(-*).  c=l  <5-i2> 

Pay  attention  to  the  symmetry  of  K  —  iFT,  whereby  its  corresponding  off-diagonal  entries, 
here  both  —1,  are  each  one-half  the  coefficient  of  the  corresponding  quadratic  monomial, 
in  this  case  —2x1x2.  Also  note  the  overall  factor  of  —2  in  front  of  the  linear  terms,  which 
is  included  for  later  convenience. 

We  first  note  that  in  the  simple  scalar  case  (5.6),  we  needed  to  impose  the  condition 
that  the  quadratic  coefficient  a  be  positive  in  order  to  obtain  a  (unique)  minimum.  The 
corresponding  condition  for  the  multivariable  case  is  that  the  quadratic  coefficient  matrix 
K  be  positive  definite.  This  key  assumption  enables  us  to  establish  a  general  minimization 
criterion. 

Theorem  5.2.  If  K  is  a  positive  definite  (and  hence  symmetric)  matrix,  then  the  quad¬ 
ratic  function  (5.10)  has  a  unique  minimizer,  which  is  the  solution  to  the  linear  system 

Kx.  =  f,  namely  x*  =  K~x  f.  (5.13) 

The  minimum  value  of  p(x)  is  equal  to  any  of  the  following  expressions: 

p(x*)  =  piK-1  f)  =  c  -  fTK~1f  =  c  -  fTx*  =  c  -  (x*)tATx*.  (5.14) 


Proof :  First  recall  that,  by  Proposition  3.31,  positive  definiteness  implies  that  K  is  a 
nonsingular  matrix,  and  hence  the  linear  system  (5.13)  has  a  unique  solution  x*  =  K~x f. 
Then,  for  all  x  G  Mn,  since  f  =  IFx*,  it  follows  that 


p(x)  =  xtKx  —  2  x1  f  +  c  =  x1  Kx  -  2X1  ATx*  +  c 


.T 


T 


.T 


=  (x  -  x*)tK{x  -  x*)  +  c  —  (x*)1  K X 


*\T 


(5.15) 
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where  we  used  the  symmetry  of  K  —  KT  to  identify  the  scalar  terms 

xtLx*  =  (xTLxf  =  (x^)tKtx  =  (xfLx. 

The  first  term  in  the  final  expression  in  (5.15)  has  the  form  y T K y,  where  y  =  x  —  x*.  Since 
we  assumed  that  K  is  positive  definite,  we  know  that  yT  Ky  >  0  for  all  y  ^  0.  Thus,  the 
first  term  achieves  its  minimum  value,  namely  0,  if  and  only  if  0  =  y  =  x  —  x*.  Since  x*  is 
fixed,  the  second,  bracketed,  term  does  not  depend  on  x,  and  hence  the  minimizer  of  p(x) 
coincides  with  the  minimizer  of  the  first  term,  namely  x  =  x*.  Moreover,  the  minimum 
value  of  p(x)  is  equal  to  the  constant  term:  p(x*)  —  c  —  (x*)Tkx*.  The  alternative 
expressions  in  (5.14)  follow  from  simple  substitutions.  Q.E.D. 


Let  us  minimize  the  quadratic  function  appearing  in  (5.11) 
5.2,  to  find  the  minimum  we  must  solve  the  linear  system 

Kx  —  f,  which,  in  this  case,  is 


Example  5.1  (continued). 

above.  According  to  Theorem 


4 

1 


(5.16) 


When  applying  the  usual  Gaussian  Elimination  algorithm,  only  one  row  operation  is  re¬ 
quired  to  place  the  coefficient  matrix  in  upper  triangular  form: 


4  -1 
1  3 


4  -1 

0  ^ 
u  4 


The  coefficient  matrix  is  regular,  since  no  row  interchanges  were  required,  and  its  two 
pivots,  namely  4  and  ,  are  both  positive.  Thus,  by  Theorem  3.43,  K  >  0,  and  hence 
p(x1:x2)  really  does  have  a  minimum,  obtained  by  applying  Back  Substitution  to  the 
reduced  system: 

-  .31818  \ 

.22727  )  ’ 

The  quickest  way  to  compute  the  minimal  value  is  to  use  the  second  formula  in  (5.14) 


(5.17) 


P(x*)  =  P  ~ 


7  5 


22’  22 


=  1- (-1,1  ) 


_  13 

_5_  /  44 


.29546 


It  is  instructive  to  compare  the  algebraic  solution  method  with  the  minimization  proce¬ 
dure  you  learned  in  multi-variable  calculus,  cf.  [2,  78],  The  critical  points  of  p(x1:x2)  are 
found  by  setting  both  partial  derivatives  equal  to  zero: 


dp 


—  8  x±  —  2x2  +  3  =  0, 


dp 


=  —  2  xx  +  6x0  —  2  =  0. 


dxx  dx2 

If  we  divide  by  an  overall  factor  of  2,  these  are  precisely  the  same  linear  equations  we 
already  constructed  in  (5.16).  Thus,  not  surprisingly,  the  calculus  approach  leads  to  the 
same  minimizer  (5.17).  To  check  whether  x*  is  a  (local)  minimum,  we  need  to  apply  the 
second  derivative  test.  In  the  case  of  a  function  of  several  variables,  this  requires  analyzing 
the  Hessian  matrix ,  which  is  the  symmetric  matrix  of  second  order  partial  derivatives 


/  d2p  d2p  \ 


H  = 


dx2 

d2p 


dx±dx2 


V  dxxdx2 


d2p 
dx  I 


=  2  K, 


/ 
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which  is  exactly  twice  the  quadratic  coefficient  matrix  (5.12).  If  the  Hessian  matrix  is 
positive  definite  —  which  we  already  know  in  this  case  —  then  the  critical  point  is  indeed 
a  (local)  minimum. 

Thus,  the  calculus  and  algebraic  approaches  to  this  minimization  problem  lead,  as  they 
must,  to  identical  results.  However,  the  algebraic  method  is  more  powerful,  because  it 
immediately  produces  the  unique ,  global  minimum,  whereas,  barring  additional  work,  cal¬ 
culus  can  guarantee  only  that  the  critical  point  is  a  local  minimum.  Moreover,  the  proof  of 
the  calculus  local  minimization  criterion  —  that  the  Hessian  matrix  be  positive  definite  at 
the  critical  point  —  relies,  in  fact,  on  the  algebraic  solution  to  the  quadratic  minimization 
problem!  In  summary:  minimization  of  quadratic  functions  is  a  problem  in  linear  alge¬ 
bra,  while  minimizing  more  complicated  functions  requires  the  full  force  of  multivariable 
calculus. 

The  most  efficient  method  for  producing  a  minimum  of  a  quadratic  function  p(x)  on 
Mn,  then,  is  to  first  write  out  the  symmetric  coefficient  matrix  K  and  the  vector  f  as  in 
(5.10).  Solving  the  system  Kx  =  f  will  produce  the  minimizer  x*  provided  K  >  0  —  which 
should  be  checked  during  the  course  of  the  procedure  using  the  criteria  of  Theorem  3.43, 
that  is,  making  sure  that  no  row  interchanges  are  used  and  all  the  pivots  are  positive. 


Example  5.3.  Let  us  minimize  the  quadratic  function 


/  \  Q  O  Q 

p(x,y,z)  =  x  +  2xy  +  xz  +  2y  +yz  +  2z  +6y 
This  has  the  matrix  form  (5.10)  with 


K  = 


( 1 

1 

\\ 

( 

1 

2 

1 

2 

,  X  = 

y 

u 

1 

2 

2  / 

W 

f  = 


v 


o\ 

3 

1/ 


and  the  minimum  is  found  by  solving  the  linear  system,  Kx  —  f. 
produces  the  LDLT  factorization 


( i  l  b\ 


K  = 


1 


1 

2 


\h  \  2/ 


1  0  o\ 
1  1  0 
w  o  l  J 


0  o\ 


0  1 

yo  o 


0 

7 

4 


1/ 


(l 
0 
Vo 


7z  +  5. 


c  =  5, 

Gaussian  Elimination 

1 

^  2 

1  0  . 

0  l) 


The  pivots,  i.e.,  the  diagonal  entries  of  D,  are  all  positive,  and  hence  K  is  positive  definite. 
Theorem  5.2  then  guarantees  that  p(x,  y ,  z)  has  a  unique  minimizer,  which  is  found  by 
solving  the  linear  system  Kx  —  f.  The  solution  is  then  quickly  obtained  by  forward  and 
back  substitution: 


x*  =  2,  if  =  —3,  z*  =  2,  with  p(x*  ,  z*)  =  p(  2,— 3,2)  =  — 11, 
and  we  conclude  that  p(x,  y:  z)  >  p( 2,  —3,  2)  =  —11  for  all  {x ,y,z)  ^  (2, -3, 2). 


Theorem  5.2  solves  the  general  quadratic  minimization  problem  when  the  quadratic 
coefficient  matrix  is  positive  definite.  If  K  is  not  positive  definite,  then  the  quadratic 
function  (5.10)  does  not  have  a  minimum,  apart  from  one  exceptional  situation. 
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Theorem  5.4.  If  the  matrix  K  is  positive  definite,  then  the  quadratic  function  (5.10)  has 
a  unique  global  minimizer  x*  satisfying  ifx*  =  f.  If  K  is  only  positive  semi-definite,  and 
f  E  img  iF,  then  every  solution  to  the  linear  system  Lx*  =  f  is  a  global  minimum  of  p(x), 
but  the  minimum  is  not  unique,  since  p(x*  +  z)  =  p(x*)  whenever  z  E  ker  K.  In  all  other 
cases,  p(x)  has  no  global  minimum. 

Proof :  The  first  part  is  merely  a  restatement  of  Theorem  5.2.  The  second  part  is  proved 
by  a  similar  computation,  and  is  left  to  the  reader.  If  K  is  not  positive  semi-definite, 
then  one  can  find  a  vector  y  such  that  a  —  yT  Ky  <  0.  If  we  set  x  =  ty,  then  p{x)  — 
pity)  —  at 2  +  2bt  +  c,  with  b  =  yTf.  Since  a  <  0,  by  choosing  1 1  \  0  sufficiently  large, 

we  can  arrange  that  p(ty)  C  0  is  an  arbitrarily  large  negative  quantity,  and  so  p  has  no 
(finite)  minimum  value.  The  one  remaining  case  —  when  K  is  positive  semi-definite,  but 
f  0  img  K  —  is  the  subject  of  Exercise  5.2.14.  Q.E.D. 


Exercises 


o  O  O 

5.2.1.  Find  the  minimum  value  of  the  function  f(x,y,z)  =  x  +2xy+3y  +2y  z+z  —2x+3z+2. 
How  do  you  know  that  your  answer  is  really  the  global  minimum? 

5.2.2.  For  the  potential  energy  function  in  (5.1),  where  is  the  equilibrium  position  of  the  ball? 

5.2.3.  For  each  of  the  following  quadratic  functions,  determine  whether  there  is  a  minimum.  If 

so,  find  the  minimizer  and  the  minimum  value  for  the  function. 

(a)  x2  —  2xy  +  4y2  -\-x  —  1,  (b)  3x2  +  3xy  Jr3y2  —  2x  —  2y  +  45  (c)  x2  +  bxy  +  3y2  +  2x  —  y, 

(d)  x2  +  y2  +  y  z  +  z2  +  x  +  y  -  z,  (e)  x2  +  xy  -  y2  -  yz  +  z2  -  3, 

(f)  x2  +  5x2;  +  y2  —  2y z  +  z2  +  2x  —  z  —  3,  (g)  x2  +  xy  +  y2  +  y z  +  z2  +  zw  +  w2  —  2x  —  w. 

5.2.4.  (a)  For  which  numbers  b  (allowing  both  positive  and  negative  numbers)  is  the  matrix 
A  =  ^  positive  definite?  (b)  Find  the  factorization  A  =  LDLT  when  b  is  in  the 
range  for  positive  definiteness,  (c)  Find  the  minimum  value  (depending  on  6;  it  might  be 

O  Q 

finite  or  it  might  be  —  oo)  of  the  function  p(x,  y)  =  x  -h  2b xy  +  Ay  —2 y. 

5.2.5.  For  each  matrix  iF,  vector  f,  and  scalar  c,  write  out  the  quadratic  function  p(x)  given  by 
(5.10).  Then  either  find  the  minimizer  x*  and  minimum  value  p(x*),  or  explain  why  there 


is  none,  (a)  K  = 


c  =  0;  (c)  K  = 


f  = 


f  = 


1 

2 


( 


V 


c  =  3;  (b)  K  = 


c  =  -3;  (d)  K  = 


3  2 
2  1 

(1 

1 

VI 


f  = 

1 

2  - 
1 


c  =  1;  (e)  K  = 

/I  1  0  0\ 

12  10 

0  13  1 

,  f  = 

f-i\ 

2 

-3 

\0  0  1  4j 

4/ 

f  = 


5.2.6.  Find  the  minimum  value  of  the  quadratic  function 

2 


c  =  0. 


n 


p(x  1, . . .  ,xn)  =  4  J2  xi  -  2  23  xixi+ 1  +  H 


1  =  1 


n  —  1 

E 

i  =  1 


n 


Xr 


for 


n  =  2,  3,  4. 


i  =  i 


5.2.7.  Find  the  maximum  value  of  the  quadratic  functions 

(a)  —  x2  +  3xy  —  by2  —  x  +  1,  (b)  —  2x2  +  3xy  —  3y2  +  Ax  —  3y. 
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5.2.8.  Suppose  K1  and  K2  are  positive  definite  n  x  n  matrices.  Suppose  that,  for  2  =  1,2,  the 
minimizer  of  p^(x)  =  xTit^x  —  2xTf)  +  c^,  is  x*.  Is  the  minimizer  of  p(x)  =  p1(x)  -Tp2(x) 
given  by  x*  =  x^  +  x^?  Prove  or  give  a  counterexample. 

rp  rp 

0  5.2.9.  Let  K  >  0.  Prove  that  a  quadratic  function  p(x)  =  x  ltx-2x  f  without  constant 
term  has  non-positive  minimum  value:  p(x*)  <  0.  When  is  the  minimum  value  zero? 


rp 

5.2.10.  Let  q(x)  =x Mxbea  quadratic  form.  Prove  that  the  minimum  value  of  q(x )  is 
either  0  or  —  oo. 


5.2.11.  Under  what  conditions  does  the  affine  function  p(x)  =  xTf  +  c  have  a  minimum? 


rp  rr 

0  5.2.12.  Under  what  conditions  does  a  quadratic  function  p(x)  =  x  Kx  —  2x  f  +  c  have  a  finite 
global  maximum?  Explain  how  to  find  the  maximizer  and  maximum  value. 


5.2.13.  True  or  false:  The  minimal-norm  solution  to  Ax 
variables  to  zero. 


b  is  obtained  by  setting  all  the  free 


0  5.2.14.  Prove  that  if  K  is  a  positive  semi-definite  matrix,  and  f  U  img  K,  then  the  quadratic 
function  p(x)  =  xT Kx  —  2xTf  +  c  has  no  minimum  value. 

Hint :  Try  looking  at  vectors  x  £  keriL. 

5.2.15.  Why  can’t  you  minimize  a  complex- valued  quadratic  function? 


5.3  The  Closest  Point 


We  are  now  ready  to  solve  the  geometric  problem  of  finding  the  element  in  a  prescribed 
subspace  that  lies  closest  to  a  given  point.  For  simplicity,  we  work  mostly  with  subspaces 
of  Mm,  equipped  with  the  Euclidean  norm  and  inner  product,  but  the  method  extends 
straightforwardly  to  arbitrary  finite-dimensional  subspaces  of  any  inner  product  space. 
However,  it  does  not  apply  to  more  general  norms  not  associated  with  inner  products, 
such  as  the  1  norm,  the  oo  norm  and,  in  fact,  the  p  norms  whenever  p  ^  2.  In  such  cases, 
finding  the  closest  point  problem  is  a  nonlinear  minimization  problem  whose  solution 
requires  more  sophisticated  analytical  techniques;  see,  for  example,  [28,  66,  79]. 


Problem.  Let  be  equipped  with  an  inner  product  ( v ,  w )  and  associated  norm 
and  let  W  C  be  a  subspace.  Given  b  £  Mm,  the  goal  is  to  find  the  point  w*  £  W  that 
minimizes  ||  w  —  b  ||  over  all  possible  w  £  W.  The  minimal  distance  d*  —  ||  w*  —  b  ||  to  the 
closest  point  is  designated  as  the  distance  from  the  point  b  to  the  subspace  W. 


Of  course,  if  b  £  LU  lies  in  the  subspace,  then  the  answer  is  easy:  the  closest  point  in 
W  is  w*  =  b  itself,  and  the  distance  from  b  to  the  subspace  is  zero.  Thus,  the  problem 
becomes  interesting  only  when  b  0  W. 

In  solving  the  closest  point  problem,  the  goal  is  to  minimize  the  squared  distance 

w  —  b  1 1 2  =  ( w  —  b  ,  w  —  b )  =  1 1  w  1 1 2  —  2(w,b)  +  ||b||2  (5.18) 


over  all  possible  w  belonging  to  the  subspace  W  C  Mm.  Let  us  assume  that  we  know  a 
basis  w1? . . . ,  wn  of  LU,  with  n  —  dim  W.  Then  the  most  general  vector  in  W  is  a  linear 
combination 

w  =  x1w1+  •••  +  xn  wn  (5.19) 

of  the  basis  vectors.  We  substitute  the  formula  (5.19)  for  w  into  the  squared  distance 
function  (5.18).  As  we  shall  see,  the  resulting  expression  is  a  quadratic  function  of  the 
coefficients  x  =  (  )T,  and  so  the  minimum  is  provided  by  Theorem  5.2. 
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First,  the  quadratic  terms  come  from  expanding 
2 


n 


W 


=  (^Wi  +'"  +  ^„wn,  IjWj  +---+xnwn)  =  52  xixj(wi’wj)-  (5.20) 

hj=  1 

Therefore,  n 

lWl|2=  52  KjXiXj  = 
i,j  =  1 

where  K  is  the  symmetric  n  x  n  Gram  matrix  whose  (i,  j)  entry  is  the  inner  product 


Kj  =  ( w*  >  wj ) 

between  the  basis  vectors  of  our  subspace;  see  Definition  3.33.  Similarly, 

n 

(w,b)  =  (x1w1  +  ■■■  +xnwn,  b)  =  52  xi  (w*>bt 

and  so 


(5.21) 


i  —  1 


(w,b)  =  51  =  xTf’ 


1=1 


where  f  G  Mn  is  the  vector  whose  zth  entry  is  the  inner  product 

fi  =  (wiX)  (5.22) 

between  the  point  and  the  subspace’s  basis  elements.  Substituting  back,  we  conclude  that 
the  squared  distance  function  (5.18)  reduces  to  the  quadratic  function 


n 


n 


p(x)  =  xTifx  —  2xTf  +  c  —  k- 


ry*  rp 

^  i  ^  j 


2  52  fixi+c ’ 


(5.23) 


i,j  =  1 


i  =  1 


in  which  K  and  f  are  given  in  (5.21-22),  while  c  =  ||  b  1 1 2 . 

Since  we  assumed  that  the  basis  vectors  wy, . . . ,  wn  are  linearly  independent,  Proposi¬ 
tion  3.36  assures  us  that  their  associated  Gram  matrix  is  positive  definite.  Therefore,  we 
may  directly  apply  our  basic  Minimization  Theorem  5.2  to  solve  the  closest  point  problem. 

Theorem  5.5.  Let  w1? . . . ,  wn  form  a  basis  for  the  subspace  W  C  Mm.  Given  b  G  Mm, 
the  closest  point  w*  =  x\  wy  +  •  •  •  +  x*  wn  G  W  is  unique  and  prescribed  by  the  solution 
x*  =  K~x f  to  the  linear  system 

Lx  =  f,  (5.24) 

where  the  entries  of  K  and  f  are  given  in  (5.21-22).  The  (minimum)  distance  between  the 
point  and  the  subspace  is 


d *  = 


w 


b  2  —  fTx" 


(5.25) 


When  the  standard  dot  product  and  Euclidean  norm  on  are  used  to  measure  dis¬ 
tance,  the  entries  of  the  Gram  matrix  K  and  the  vector  f  are  given  by 


ryi 

h .  .  =  w  •  w  =  W  ;  W  ■ 

i  j  i  j  ’ 


f.  =  W,  ■  b  =  w/b. 


As  in  (3.62),  each  set  of  equations  can  be  combined  into  a  single  matrix  equation. 
A  =  ( Wj  w2  . . .  wn )  denotes  the  m  x  n  matrix  formed  by  the  basis  vectors,  then 


If 


K  =  AtA, 


f  =  At  b, 


c  =  bTb  = 


b 


2 


(5.26) 
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A  direct  derivation  of  these  equations  is  instructive.  Since,  by  formula  (2.13), 


w  =  x1w1  -\- 


+  XnWn  =  ^X> 


we  have 


w  —  b 


Ax-  bf  =  (Ax  —  b)r(Ax  —  b)  =  (xTAT  -  bT)  (Ax  -  b) 

=  xT AT A  x  -  2  xT ATb  +  bTb  =  xTifx  -  2  xTf  +  c, 


thereby  justifying  (5.26).  Thus,  Theorem  5.5  implies  that  the  closest  point  w*  =  Ax*  £  W 
to  b  in  the  Euclidean  norm  is  obtained  by  solving  what  are  known  as  the  normal  equations 


(AtA)x  =  AJ  b, 


T- 


for 


X*  =  (ATA)_1ATb, 


giving 


w*  =  Ax*  =  A(ATA)_1ATb. 


(5.27) 


(5.28) 


If,  instead  of  the  Euclidean  inner  product,  we  adopt  a  weighted  inner  product  ( v ,  w )  = 
vTC  w  on  RT"  prescribed  by  a  positive  definite  m  x  m  matrix  C  >  0.  then  the  same 
computations  produce 


K  =  AtCA, 


f  =  ATCb, 


c  =  bTCb  =  b 


(5.29) 


The  resulting  formula  for  the  weighted  Gram  matrix  K  was  previously  derived  in  (3.64). 
In  this  case,  the  closest  point  w*  £  W  in  the  weighted  norm  is  obtained  by  solving  the 
weighted  normal  equations 

AtCAx  =  AtC  b,  (5.30) 

so  that 

x*  =  (ATCA)~1ATCb,  w*  =  Ax*  =  A(ATC'A)_1ATC'b.  (5.31) 

(A  (  2 

Example  5.6.  Let  W  C  M3  be  the  plane  spanned  by  wy  =  2  ,  w2  =  —3  |  . 

i\  V-1/  V-1 

Our  goal  is  to  find  the  point  G  W  closest  to  b  =  (  0  ,  where  distance  is  mea- 

0/ 

sured  in  the  usual  Euclidean  norm.  We  combine  the  basis  vectors  to  form  the  matrix 
1  2 

A  =  I  2  —3  | .  According  to  (5.26),  the  positive  definite  Gram  matrix  and  associated 

-1  -1 

vector  are 


K  =  AtA  = 


6  -3 

-3  14 


f  =  Arb  = 


1 


(Alternatively,  these  can  be  computed  directly  by  taking  inner  products,  as  in  (5.21-22).) 
We  solve  the  linear  system 

ATx  =  f  for  x*  =  AT“1f=lT5 


1 

5 


Theorem  5.5  implies  that  the  closest  point  is 

w*  =  Ax*  =  +x2w2  = 


/  *  \ 

3 


15 

7 


\-i5/ 


.6667  \ 
.0667 
.4667  ) 
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w 

3 


1 

Vs 


b  =  -4  «  .5774. 


The  distance  from  the  point  b  to  the  plane  is  d *  = 

Suppose,  on  the  other  hand,  that  distance  in  M6  is  measured  in  the  weighted  norm 
|  v  ||  =  v\  +  +  \vz  corresponding  to  the  positive  definite  diagonal  matrix 

C  =  diag  ( 1 ,  \ | ) .  In  this  case,  we  form  the  weighted  Gram  matrix  and  vector  (5.29): 


K  =  ATCA  = 


f  =  ATCb  = 


2 

3 

2 

3 


0 

1 

2 
0 

0 

1 

2 

0 


10 

3 

2 

3 


and  so 


x*  =  K~x  f 


.3506 

,2529 


w 


=  4x1 


.8563  \ 
.0575 
.6034  ) 


Now  the  distance  between  the  point  and  the  subspace  is  measured  in  the  weighted  norm: 


d*  = 


w 


.3790. 


Remark.  The  solution  to  the  closest  point  problem  given  in  Theorem  5.5  applies,  as  stated, 
to  the  more  general  case  in  which  W  C  V  is  a  finite-dimensional  subspace  of  a  general  inner 
product  space  V.  The  underlying  inner  product  space  V  can  even  be  infinite-dimensional, 
as,  for  example,  in  least  squares  approximations  in  function  space. 

Now,  consider  what  happens  if  we  know  an  orthonormal  basis  u1? . . . ,  un  of  the  subspace 


W.  Since,  by  definition,  (  ,  u  • )  =  0  for  i  ^  j,  while  (  )  =  ||  uj|2  =  1,  the  associated 

Gram  matrix  is  the  identity  matrix:  K  —  I.  Thus,  in  this  situation,  the  system  (5.24) 
reduces  to  simply  x  =  f ,  with  solution  =  fi  =  (u*  V),  and  the  closest  point  is  given  by 

where  x7  =  (b,ui),  i  =  l,...,n.  (5.32) 


w 


—  x\  u: 


+ 


+  <Un 


We  have  already  seen  this  formula!  According  to  Theorem  4.32,  w*  is  the  orthogonal 
projection  of  b  onto  the  subspace  W.  Thus,  if  we  are  supplied  with  an  orthonormal  basis 
of  our  subspace,  we  can  easily  compute  the  closest  point  using  the  orthogonal  projection 
formula  (5.32).  If  the  basis  is  orthogonal,  one  can  either  normalize  it  or  directly  apply  the 
equivalent  orthogonal  projection  formula  (4.42). 

In  this  manner,  we  have  established  the  key  connection  identifying  the  closest  point 
in  the  subspace  to  a  given  vector  with  the  orthogonal  projection  of  that  vector  onto  the 
subspace. 


Theorem  5.7.  Let  W  C  V  be  a  finite-dimensional  subspace  of  an  inner  product  space. 
Given  a  point  b  E  V,  the  closest  point  w*  E  W  coincides  with  the  orthogonal  projection 
of  b  onto  W. 


Example  5.8.  Let  M4  be  equipped  with  the  ordinary  Euclidean  norm.  Consider  the 

three-dimensional  subspace  W  C  M4  spanned  by  the  orthogonal  vectors  v1  =  (l,— 1,2,0)  , 
v2  =  ( 0,  2, 1,  — 2  )T  ,  v3  =  ( 1, 1,  0, 1  )T  .  Given  b  =  ( 1,  2,  2, 1  )T,  our  task  is  to  find  the  clos¬ 
est  point  w*  E  W.  Since  the  spanning  vectors  are  orthogonal  (but  not  orthonormal),  we 
can  use  the  orthogonal  projection  formula  (4.42)  to  find  w*  =  x1  v:  +  x2  v2  +  x3  v3,  with 


<b,vx)  3  1  (b,v2)  4  (b,v3)  4 
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Thus,  the  closest  point  to  b  in  the  given  subspace  is 
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'k  1  i  4  i  4 
W  =  2V1  +  9V2  +  3V3 


(11  31  13  40 

V  6  ’  18’  9  ’  9  ) 


We  further  note  that,  in  accordance  with  the  orthogonal  projection  property,  the  vector 

z  =  b  —  w*  =  (  —  —  —  -  — 

*  w  V  6  ’  18  ’  9  ’  9  / 


is  orthogonal  to  v1?  v2,  v3  and  hence  to  the  entire  subspace. 


Even  when  we  only  know  a  non-orthogonal  basis  for  the  subspace,  it  may  still  be  a  good 
strategy  to  first  apply  the  Gram-Schmidt  process  in  order  to  replace  it  by  an  orthonormal  or 
orthogonal  basis,  and  then  apply  the  relevant  orthogonal  projection  formula  to  calculate  the 
closest  point.  Not  only  does  this  simplify  the  final  computation,  it  can  often  ameliorate  the 
numerical  inaccuracies  associated  with  ill-conditioning  that  can  afflict  the  direct  solution 
to  the  system  (5.24).  The  following  example  illustrates  this  alternative  procedure. 


Example  5.9.  Let  us  return  to  the  problem,  solved  in  Example  5.6,  of  finding  the 


rj~i  ^ 

closest  point  in  the  plane  W  spanned  by  w1  =  ( 1,  2,  —  1 )  ,  w2  =  ( 2,  —3,  —1  Y  to  the 

point  b  =  ( 1,  0,  0 )  .  We  proceed  by  first  using  the  Gram-Schmidt  process  to  compute  an 
orthogonal  basis 

1 


V1  =  W1 


w,  •  v1 

Vo  =  Wo - - - A  V  ,  = 


2 


1 


1 


V 


3  / 

9  / 


for  our  subspace.  As  a  result,  we  can  use  the  orthogonal  projection  formula  (4.42)  to 
produce  the  closest  point 


/ 


*  b  •  Vi 
w  = 


l 


2  V1 


+ 


b  •  v. 


2  V2  = 


\ 


—  \ 

3 

J_ 

15 

tJ 


reconfirming  our  earlier  result. 


Exercises 


Note :  Unless  otherwise  indicated,  “distance”  refers  to  the  Euclidean  norm. 

5.3.1.  Find  the  closest  point  in  the  plane  spanned  by  ( 1,  2,  —1  )T  ,  ( 0,  —1,  3  )T  to  the  point 

T 

(1,1,1)  .  What  is  the  distance  between  the  point  and  the  plane? 

5.3.2.  Redo  Exercise  5.3.1  using 

(a)  the  weighted  inner  product  (v,w)  =  2^1^  +  4v2w2  +  3f3  u>3;  (b)  the  inner  product 

/  2  -1  0 


m 

( v  ,  w )  =  v  C  w  based  on  the  positive  definite  matrix  C  = 


-1  2 
\  0  -1 


5.3.3.  Find  the  point  in  the  plane  x  -\-2y  —  z  =  0  that  is  closest  to  ( 0,  0, 1 ) 


T 


T 

5.3.4.  Let  b  =  (3, 1,  2, 1 )  .  Find  the  closest  point  and  the  distance  from  b  to  the  following 
subspaces:  (a)  the  line  in  the  direction  ( 1, 1, 1, 1  )T  ;  (b)  the  plane  spanned  by  ( 1, 1,  0,  0  )T 
and  ( 0,  0, 1, 1  )T  ;  (c)  the  hyperplane  spanned  by  ( 1,  0,  0,  0  )T,  (  0, 1,  0,  0  )T,  ( 0,  0, 1,  0  )T  ; 

(d)  the  hyperplane  defined  by  the  equation  x-\-y  +  z-\-w  =  0. 
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T 

5.3.5.  Find  the  closest  point  and  the  distance  from  b  =  (1,1,2,  —2 )  to  the  subspace  spanned 
by  (1, 2,-1, 0)T,  (0, 1,-2,  —if,  (1,0,3,  2  f. 

5.3.6.  Redo  Exercises  5.3.4  and  5.3.5  using 


U3  +  V4w4; 

f  4-1 

1 

0\ 

-1  4 

-1 

1 

1  -1 

4 

-1 

V  0  1 

-1 

4/ 

timizes  w 

-  (0,3, 1,2 

(ii)  the  inner  product  based  on  the  positive  definite  matrix  C  = 


5.3.7.  Find  the  vector  w*  £  span  {  ( 0,  0, 1, 1 ) ,  (  2, 1, 1, 1 )  J  that  minimizes  ||  w  —  (0,3, 1,2)  || . 

T 

5.3.8.  (a)  Find  the  distance  from  the  point  b  =  (1,2,  —  1)  to  the  plane  x  —  2y  +  z  =  0. 

( b )  Find  the  distance  to  the  plane  x  —  2y  +  z  =  3. 

Hint:  Move  the  point  and  the  plane  so  that  the  plane  goes  through  the  origin. 

0  5.3.9.  (a)  Given  a  configuration  of  n  points  a1? . . . ,  a  in  the  plane,  explain  how  to  find  the 


n 


x  —  a  • 


.  (b)  Apply  your 


point  x  £  R2  that  minimizes  the  total  squared  distance  E 

i  =  1 

method  when  ( i )  ax  =  ( 1,  3 )  ,  a2  =  (  —2,  5  );  (ii)  a1  =  ( 0,  0  ) ,  a2  =  (  0, 1 ) ,  a3  =  (1,0); 
(in)  ax  =  ( 0,  0 ) ,  a2  =  ( 0,  2 ) ,  a3  =  ( 1,  2 ) ,  a4  =  (  -2,  -1 ). 


5.3.10.  Answer  Exercise  5.3.9  when  distance  is  measured  in  (a)  the  weighted  norm 

3  -1 

1  2 


x 


=  \j2x\Jr  3x2  ;  (b)  the  norm  based  on  the  positive  definite  matrix  ^ 


5.3.11.  Explain  why  the  quantity  inside  the  square  root  in  (5.25)  is  always  non- negative. 

5.3.12.  Find  the  closest  point  to  the  vector  b  =  ( 1,0,  2)  belonging  the  two-dimensional 

T  T 

subspace  spanned  by  the  orthogonal  vectors  v1  =  (l,— 1,1)  ,v2  =  (— 1,1,2)  . 

5.3.13.  Let  b  =  ( 0,  3, 1,  2  )T  .  Find  the  vector  w*  £  span  {  (  0,  0, 1, 1  )T  ,  (  2, 1, 1,  —1  )T  }  such 
that  w*  —  b  II  is  minimized. 


5.3.14.  Find  the  closest  point  to  b  =  ( 1,  2,  —1,  3  )T  in  the  subspace  W  =  span  j  ( 1,  0,  2, 1 ) 

(1,1,0,— 1)  ,(2,0, 1,-1)  |  by  first  constructing  an  orthogonal  basis  of  W  and  then 

applying  the  orthogonal  projection  formula  (4.42). 


T 


5.3.15.  Repeat  Exercise  5.3.14  using  the  weighted  norm 
0  5.3.16.  Justify  the  formulas  in  (5.29). 


2  i  r>  2  i  2  i  o  2 

=  v-i  +  2  v0  +  Vo  +  3  vA  . 


5.4  Least  Squares 

As  we  first  observed  in  Section  5.1,  the  solution  to  the  closest  point  problem  also  solves 
the  basic  least  squares  minimization  problem.  Let  us  first  officially  define  the  notion  of  a 
(classical)  least  squares  solution  to  a  linear  system. 


Definition  5.10.  A  least  squares  solution  to  a  linear  system  of  equations 

4x  =  b 

is  a  vector  x*  £  Mn  that  minimizes  the  squared  Euclidean  norm  A  x  —  b  2 


(5.33) 


If  the  system  (5.33)  actually  has  a  solution,  then  it  is  automatically  the  least  squares 


5.4  Least  Squares 
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solution.  The  concept  of  least  squares  solution  is  new  only  when  the  system  does  not  have 
a  solution,  i.e.,  b  does  not  he  in  the  image  of  A.  We  also  want  the  least  squares  solution 
to  be  unique.  As  with  an  ordinary  solution,  this  happens  if  and  only  if  ker  A  =  {0}, 
or,  equivalently,  the  columns  of  A  are  linearly  independent,  or,  equivalently,  rank  A  =  n. 
Indeed,  if  z  E  ker  A,  then  x  =  x  A  z  also  satisfies 


Ax  —  b  ||2  =  ||  A(x  +  z)  —  b 


Ax-b 


and  hence  is  also  a  minimum.  Thus,  uniqueness  requires  z  =  0. 

As  before,  to  make  the  connection  with  the  closest  point  problem,  we  identify  the 
subspace  W  =  img  A  C  as  the  image  or  column  space  of  the  matrix  A.  If  the  columns 
of  A  are  linearly  independent,  then  they  form  a  basis  for  the  image  W.  Since  every 
element  of  the  image  can  be  written  as  w  =  Ax,  minimizing  ||  Ax  —  b  1 1 2  is  the  same  as 
minimizing  the  distance  ||  w  —  b  ||  between  the  point  and  the  subspace.  The  solution  x*  to 
the  quadratic  minimization  problem  produces  the  closest  point  w*  =  Ax*  in  W  =  img  A, 
which  is  thus  found  using  Theorem  5.5.  In  the  Euclidean  case,  we  therefore  find  the  least 
squares  solution  by  solving  the  normal  equations  given  in  (5.27). 


Theorem  5.11.  Assume  that  ker  A  =  {0}.  Then  the  least  squares  solution  to  the  linear 
system  A  x  =  b  under  the  Euclidean  norm  is  the  unique  solution  x*  to  the  normal  equations 

(AtA)x  =  ATb,  namely  x*  =  (ATA)-1ATb.  (5.34) 


The  least  squares  error  is 


Ax* 


■Tx* 


bTA(ATA)-1ATb. 


(5.35) 


Note  that  the  normal  equations  (5.27)  can  be  simply  obtained  by  multiplying  the  original 
system  Ax  =  b  on  both  sides  by  AT.  In  particular,  if  A  is  square  and  invertible,  then 
(ATA)-1  =  A-1(AT)-1,  and  so  the  least  squares  solution  formula  (5.34)  reduces  to  x  = 
A_1b,  while  the  two  terms  under  the  square  root  in  the  error  formula  (5.35)  cancel  out, 
producing  zero  error.  In  the  rectangular  case  —  when  inversion  of  A  itself  is  not  allowed 
(5.34)  gives  a  new  formula  for  the  solution  to  the  linear  system  Ax  =  b  whenever 
b  E  img  A.  See  also  the  discussion  concerning  the  pseudoinverse  of  a  matrix  in  Section  8.7 
for  an  alternative  approach. 


Example  5.12. 


Consider  the  linear  system 

xx  A  2x2 
3x1  —  x2  A  x3 
—  x i  A  2^2  A  x g 


=  1, 
=  0, 
=  -1 


x 


x< 


2xo  —  2, 


2x1  +  x2  —  x3  =  2, 


consisting  of  5  equations  in  3  unknowns.  The  coefficient  matrix  and  right-hand  side  are 


t  1 

2 

°\ 

f  A 

3 

-1 

1 

0 

A  = 

-1 

2 

1 

,  b  = 

-1 

1 

-1 

-2 

2 
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A  direct  application  of  Gaussian  Elimination  shows  that  b  0  imgA,  and  so  the  system  is 
incompatible  —  it  has  no  solution.  Of  course,  to  apply  the  least  squares  method,  we  are 
not  required  to  check  this  in  advance.  If  the  system  has  a  solution,  it  is  the  least  squares 
solution  too,  and  the  least  squares  method  will  find  it. 

Let  us  find  the  least  squares  solution  based  on  the  Euclidean  norm,  so  that  C  =  I  in 
Theorem  5.13.  According  to  (5.26), 


K  =  AtA  = 


f  =  At  b  = 


Solving  the  3x3  system  of  normal  equations  iLx  =  f  by  Gaussian  Elimination,  we  find 

X*  =  K~xi  «  ( .4119,  .2482,  -.9532  f 
to  be  the  least  squares  solution  to  the  system.  The  least  squares  error  is 


(  -.0917,  .0342,  .1313,  .0701,  .0252 ) 


T  i  2 


.03236, 


indicating  that  the  system  is,  roughly  speaking,  not  too 


b-  Ax* 

which  is  reasonably  small 
incompatible. 

An  alternative  strategy  is  to  begin  by  orthonormalizing  the  columns  of  A  using  Gram- 
Schmidt.  We  can  then  apply  the  orthogonal  projection  formula  (4.41)  to  construct  the 
same  least  squares  solution.  Details  of  the  latter  computation  are  left  to  the  reader. 

One  can  extend  the  basic  least  squares  method  by  introducing  a  suitable  weighted 
norm  in  the  measurement  of  the  error.  Let  C  >  0  be  a  positive  definite  matrix  that 


governs  the  weighted  norm 


=  vJCv.  In  most  applications,  C  —  diag(c1? 


c 

’  m 


) 


is  a  diagonal  matrix  whose  entries  are  the  assigned  weights  of  the  individual  coordinates, 
but  the  method  works  equally  well  for  general  norms  defined  by  positive  definite  matrices. 
The  off-diagonal  entries  of  C  can  be  used  to  weight  cross-correlations  between  data  values, 
although  this  extra  freedom  is  rarely  used  in  practice.  The  weighted  least  squares  solution 
is  thus  obtained  by  solving  the  corresponding  weighted  normal  equations  (5.30),  as  follows. 

Theorem  5.13.  Suppose  A  is  an  m  x  n  matrix  such  that  ker  A  =  {0},  and  suppose  C  >  0 
is  any  positive  definite  m  x  m  matrix  specifying  the  weighted  norm  ||  v  ||2  =  vTCv.  Then 
the  least  squares  solution  to  the  linear  system  A  x  =  b  that  minimizes  the  weighted  squared 


error 


Ax  —  b  || 2  is  the  unique  solution  x*  to  the  weighted  normal  equations 


ATC  AyA  =  ArC  b, 

The  weighted  least  squares  error  is 


t. 


so  that 


x*  =  (AtCA)-lAi'C  b. 


-1  A  T. 


(5.36) 


Ax1 


fTx*  =  b 


bTGA(AiA)-iAiGb 


T 


1  aT. 


(5.37) 


Exercises 


Note :  Unless  otherwise  indicated,  use  the  Euclidean  norm  to  measure  the  least  squares  error. 


5.4.1.  Find  the  least  squares  solution  to  the  linear  system  Ax  =  b  when 


m 

m 

(l  0\ 

m  ( 

(a)  A  = 

2  ,  b  = 

i 

,  (b)A  = 

2  -1  ,  b  = 

3  ,  (c)A  = 

0/ 

u  5 ; 

\ 
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5.4.2.  Find  the  least  squares  solutions  to  the  following  linear  systems: 

(a)  x  +  2y  =  l,  3x  —  y  =  0,  —  x  +  2//  =  3, 

(b)  4x  —  2y  =  1,  2.x  +  3y  =  —4,  x  —  2y  =  —  1,  2  x  +  2  y  =  2 , 

(c)  2n  +  r  -  2x  =  1 ,  3n  —  2w  =  0,  u  —  x  +  3 a;  =  2 , 

(d)  x  —  z=—  1,  2x  —  y  +  3z  =  1,  y  —  3z  =  0,  — 5x  +  2yH-^  =  3. 


(e)  x1  +  x2  =  2,  x2  +  x4  =  1,  x1  +  x3  =  0. 


x< 


x4  =  1, 


x 


x4  =  2. 


/  3  -3 


5.4.3.  Let  A  = 


2 

Vi 


4 

2 


b  = 


/  6\ 

5 

W 


Prove,  using  Gaussian  Elimination,  that  the  linear 


system  Ax  =  b  has  a  unique  solution.  Show  that  the  least  squares  solution  (5.34)  is  the 
same.  Explain  why  this  is  necessarily  the  case. 

5.4.4.  Find  the  least  squares  solution  to  the  linear  system  Ax  =  b  when 


(a)  A  = 


(2  3  \ 

4  -2 

1  5 

,  b  = 

-1 

1 

,  (b)  A  = 

(2  1  4  \ 

1-2  1 

1  0  -3 

,  b  = 

H-l  O  O 

V  2  0^ 

3  / 

U  2  —2  / 

Vo  J 

5.4.5.  Given  A  = 


/ 


V 


1 

0 

1 

3 


2 

-2 

5 

1 


1\ 
3 
1 
1/ 


and  b  = 


/0\ 

5 

6 

V  8  / 


find  the  least  squares  solution  to  the  system 


Ax  =  b.  What  is  the  error?  Interpret  your  result. 

5.4.6.  Find  the  least  squares  solution  to  the  linear  systems  in  Exercise  5.4.1  under  the  weighted 


norm 


x 


2  _  2  |  ri  2  |  q  2 

—  Xi  ~j~  2xq  4~  3x 


3- 


0  5.4.7.  Let  A  be  an  m  x  n  matrix  with  ker  A  =  {0}.  Suppose  that  we  use  the  Gram-Schmidt 
algorithm  to  factor  A  =  Q  R  as  in  Exercise  4.3.32.  Prove  that  the  least  squares  solution  to 

rT1 

the  linear  system  Ax  =  b  is  found  by  solving  the  triangular  system  Rx  =  Q  b  by  Back 
Substitution. 


5.4.8.  Apply  the  method  in  Exercise  5.4.7  to  find  the  least  squares  solutions  to  the  systems  in 
Exercise  5.4.2. 


0  5.4.9.  (a)  Find  a  formula  for  the  least  squares  error  (5.35)  in  terms  of  an  orthonormal  basis  of 
the  subspace,  (b)  Generalize  your  formula  to  the  case  of  an  orthogonal  basis. 


5.4.10.  Find  the  least  squares  solutions  to  the  following  linear  systems.  Hint :  Check 


i  i 

~1\ 

(*) 

(  B 

orthogonality  of  the  columns  of  the  coefficient  matrix.  (a)  2 

2 

=  o  , 

V  3 

-1 ) 

\yj 

l-i/ 

/  3  -1\ 

/  2  \ 

0  2 

Ui  - 

1 

-2  1 

-1 

^  1  5  / 

^  l) 

/ 


(c) 


V 


1 

1 

2 

1 

0 


-1 

3 

1 

0 

7 


-1\ 

2 

0 

-1 

-1/ 


(  x^ 
V 

w 


/-1\ 


0 

1 

-1 


V  o  / 


0  5.4.11.  Suppose  we  are  interested  in  solving  a  linear  system  Ax  =  b  by  the  method  of  least 
squares  when  the  coefficient  matrix  A  has  linearly  dependent  columns.  Let  Kx  =  f, 
where  K  =  A  CA,  f  =  A  Cb,  be  the  corresponding  normal  equations,  (a)  Prove 
that  f  G  img  K,  and  so  the  normal  equations  have  a  solution.  Hint :  Use  Exercise  3.4.32. 
(b)  Prove  that  every  solution  to  the  normal  equations  minimizes  the  least  squares  error, 
and  hence  qualifies  as  a  least  squares  solution  to  the  original  system,  (c)  Explain  why  the 
least  squares  solution  is  not  unique. 


0  5.4.12.  Which  is  the  more  efficient  algorithm:  direct  least  squares  based  on  solving  the  normal 
equations  by  Gaussian  Elimination,  or  using  Gram-Schmidt  orthonormalization  and  then 
solving  the  resulting  triangular  system  by  Back  Substitution  as  in  Exercise  5.4.7?  Justify 
your  answer. 
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5.4.13.  A  group  of  students  knows  that  the  least  squares  solution  to  Ax  =  b  can  be  identified 
with  the  closest  point  on  the  subspace  img  A  spanned  by  the  columns  of  the  coefficient 
matrix.  Therefore,  they  try  to  find  the  solution  by  first  orthonormalizing  the  columns  using 
Gram-Schmidt,  and  then  finding  the  least  squares  coefficients  by  the  orthonormal  basis 
formula  (4.41).  To  their  surprise,  they  does  not  get  the  same  solution!  Can  you  explain  the 
source  of  their  difficulty?  How  can  you  use  their  solution  to  obtain  the  proper  least  squares 
solution  x?  Check  your  algorithm  with  the  system  that  we  treated  in  Example  5.12. 


5.5  Data  Fitting  and  Interpolation 

One  of  the  most  important  applications  of  the  least  squares  minimization  process  is  to 
the  fitting  of  data  points.  Suppose  we  are  running  an  experiment  in  which  we  measure  a 
certain  time-dependent  physical  quantity.  At  time  ti  we  make  the  measurement  yi,  and 
thereby  obtain  a  set  of,  say,  m  data  points 

(*l»2/l)>  •••  (5-38) 

Suppose  our  theory  indicates  that  all  the  data  points  are  supposed  to  lie  on  a  single  line 

y  —  a  +  /3t,  (5.39) 


whose  precise  form  —  meaning  its  coefficients  a,  /?  —  is  to  be  determined.  For  example, 
a  police  car  is  interested  in  clocking  the  speed  of  a  vehicle  by  using  measurements  of  its 
relative  distance  at  several  times.  Assuming  that  the  vehicle  is  traveling  at  constant  speed, 
its  position  at  time  t  will  have  the  linear  form  (5.39),  with  /?,  the  velocity,  and  a,  the  initial 
position,  to  be  determined.  The  amount  by  which  (3  exceeds  the  speed  limit  will  determine 
whether  the  police  decide  to  give  chase.  Experimental  error  will  almost  inevitably  make 
this  measurement  impossible  to  achieve  exactly,  and  so  the  problem  is  to  find  the  straight 
line  (5.39)  that  “best  fits”  the  measured  data  and  then  use  its  slope  to  estimate  the  vehicle’s 
velocity. 

At  the  time  t  —  ti:  the  error  between  the  measured  value  yi  and  the  sample  value 
predicted  by  the  function  (5.39)  is 

e*  =  Vi  ~  (a  +  fiti),  i  = 


We  can  write  this  system  of  equations  in  the  compact  vectorial  form 


e  =  y  -  Ax, 


where 

(e  i\ 

e2 

e  = 

\  Gr>,  / 


while 


(5.40) 


We  call  e  E  the  error  vector  and  y  E  the  data  vector.  The  m  x  2  matrix  A  is 
prescribed  by  the  sample  times.  The  coefficients  a ,  (3  of  our  desired  function  (5.39)  are  the 
unknowns,  forming  the  entries  of  the  column  vector  x  E  M2. 

If  we  could  fit  the  data  exactly,  so  yi  =  a  +  ^ti  for  all  i,  then  each  error  would  vanish, 
ei  =  0,  and  we  could  solve  the  linear  system  Ax  =  y  for  the  coefficients  a,  (3.  In  the 
language  of  linear  algebra,  the  data  points  all  lie  on  a  straight  line  if  and  only  if  y  E  img  A. 
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Figure  5.5. 


Least  Squares  Approximation  of  Data  by  a  Straight  Line. 


If  the  data  points  are  not  collinear,  then  we  seek  the  straight  line  that  minimizes  the  total 
squared  error : 


Squared  Error  = 


—  e-i  + 


+  e 


rri  5 


which  coincides  with  the  squared  Euclidean  norm  of  the  error  vector.  Pictorially,  referring 
to  Figure  5.5,  the  errors  are  the  vertical  distances  from  the  points  to  the  line,  and  we 
are  seeking  to  minimize  the  sum  of  the  squares  of  the  individual  errors^",  hence  the  term 
least  squares.  In  other  words,  we  are  looking  for  the  coefficient  vector  x  =  (cq/3)  that 
minimizes  the  Euclidean  norm  of  the  error  vector 


(5.41) 

Thus,  we  have  a  manifestation  of  the  problem  of  characterizing  the  least  squares  solution 
to  the  linear  system  4x  =  y. 

Theorem  5.11  prescribes  the  solution  to  this  least  squares  minimization  problem.  We 
form  the  normal  equations 

(ATA)x.  =  At y,  with  solution  x*  =  (ATA)~1AT y.  (5.42) 

Invertibility  of  the  Gram  matrix  K  —  ATA  relies  on  the  assumption  that  the  matrix  A  has 
linearly  independent  columns.  For  the  particular  matrix  in  (5.40),  linear  independence  of 
its  two  columns  requires  that  not  all  the  £4 s  be  equal,  i.e.,  we  must  measure  the  data  at 
at  least  two  distinct  times.  Note  that  this  restriction  does  not  preclude  measuring  some  of 
the  data  at  the  same  time,  e.g.,  by  repeating  the  experiment.  However,  choosing  all  the 
£4 s  to  be  the  same  is  a  silly  data  fitting  problem.  (Why?) 


^  This  choice  of  minimization  may  strike  the  reader  as  a  little  odd.  Why  not  just  minimize  the 

of  the  error  vector, 

|  }?  The  answer  is 


=  e,  H - h  e 


=  max{  |  e1 


sum  of  the  absolute  value  of  the  errors,  i.e.,  the  1  norm 
or  minimize  the  maximal  error,  i.e.,  the  oo  norm  ||e| 
that,  although  each  of  these  alternative  minimization  criteria  is  interesting  and  potentially  useful, 
they  all  lead  to  nonlinear  minimization  problems,  and  so  are  much  harder  to  solve!  The  least 
squares  minimization  problem  can  be  solved  by  linear  algebra,  and  so,  purely  on  the  grounds  of 
simplicity,  is  the  method  of  choice  in  most  applications.  Moreover,  as  always,  one  needs  to  fully 
understand  the  linear  problem  before  diving  into  more  treacherous  nonlinear  waters.  Or,  even 
better,  why  minimize  the  vertical  distance  to  the  line?  The  shortest  distance  from  each  data  point 
to  the  line,  as  measured  along  the  perpendicular  and  explicitly  computed  in  Exercise  5.1.8,  might 
strike  you  as  a  better  measure  of  error.  To  solve  the  latter  problem,  see  Section  8.8  on  Principal 
Component  Analysis,  particularly  Exercise  8.8.11. 
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Under  this  assumption,  we  then  compute 


where  the  overbars,  namely 


1 


h  \ 

:  I  v  E  EX)2J  V 

^  rn 

j  V  I  \ 

Vl  =(  ^v,  )=m ,M, 

:  I  \Et»i/  VvJ 

\ym' 


i  = 1  i—  1  z  =  1  z  =  1 


denote  the  average  sample  values  of  the  indicated  variables. 


(5.43) 


(5.44) 


Warning.  The  average  of  a  product  is  not  equal  to  the  product  of  the  averages!  In 
particular,  t2  (t)2,  ty  ^  ty. 

Substituting  (5.43)  into  the  normal  equations  (5.42),  and  canceling  the  common  factor 
of  m,  we  find  that  we  have  only  to  solve  the  pair  of  linear  equations 

a  fit  =  y,  at  +  fit2  =  t  y, 


for  the  coefficients: 


a  =  y- fit, 


/3=ly~tV 


t2  —  (t)2  £(*i-*)2 

Therefore,  the  best  (in  the  least  squares  sense)  straight  line  that  fits  the  given  data  is 


(5.45) 


y  =  /3(t-t)  +  y, 


(5.46) 


where  the  line’s  slope  fi  is  given  in  (5.45). 

More  generally,  one  may  wish  to  assign  different  weights  to  the  measurement  errors. 
Suppose  some  of  the  data  are  known  to  be  more  reliable  or  more  significant  than  others. 
For  example,  measurements  at  an  earlier  time  may  be  more  accurate,  or  more  critical  to 
the  data  fitting  problem,  than  later  measurements.  In  that  situation,  we  should  penalize 
any  errors  in  the  earlier  measurements  and  downplay  errors  in  the  later  data. 

In  general,  this  requires  the  introduction  of  a  positive  weight  ci  >  0  associated  with 
each  data  point  (t^yfi;  the  larger  the  weight,  the  more  vital  the  error.  For  a  straight  line 
approximation  y  —  a  +  fit,  the  weighted  squared  error  is  defined  as 


rri 

Weighted  Squared  Error  =  cie^  =  eT C e  = 

i=  1 


e 


5 


where  C  —  diag  (c1? . . . ,  cm)  >  0  is  the  positive  definite  diagonal  weight  matrix ,  while  ||  e  || 
denotes  the  associated  weighted  norm  of  the  error  vector  e  =  y  —  A  x.  One  then  applies 
the  weighted  normal  equations  (5.30)  to  effect  the  solution  to  the  problem. 
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Example  5.14.  Suppose  the  data  points  are  given  by  the  table 


0 

1 

CO 

6 

Vi 

2 

CO 

7 

12 

To  find  the  least  squares  line,  we  construct 


A  = 


i 

Vi 


°\ 

1 

3 

6/ 


At  = 


1111 
0  13  6 


y  = 


2\ 

3 

7 

\nj 


Therefore 


ata  = 


4 

10 


10 

46 


ATy  = 


24 

96 


a  =  12 


The  normal  equations  (5.42)  reduce  to 

4<a  +  10/3  =  24,  10a  +  46/3  =  96,  so 

Therefore,  the  best  least  squares  fit  to  the  data  is  the  straight  line 

+  1.71429  +  1.71429 1. 

Alternatively,  one  can  compute  this  formula  directly  from  (5.45-46). 

Now,  suppose  we  assign  different  weights  to  the  preceding  data  points,  e.g.,  c1  =  3, 
c2  =  2,  c3  =  c4  =  Thus,  errors  in  the  first  two  data  values  are  assigned  higher 
significance  than  those  in  the  latter  two.  To  find  the  weighted  least  squares  line  that  best 
fits  the  data,  we  compute 


AtCA  = 


1 

0 


ATCy  = 


1 

0 


V 

(3 

0 

0 

°\ 

/l 

°\ 

1 

i 

A 

0 

2 

0 

0 

1 

1 

1 

3 

6/ 

0 

0 

1 

2 

0 

1 

3 

Vo 

0 

0 

-J 

\1 

6  J 

.  /3 

0 

0 

OX  /  2 X 

1 

1 

A 

0 

2 

0 

0 

3 

1 

3 

6y 

0 

0 

1 

2 

0 

7 

Vo 

0 

0 

-J 

V 12  / 

Thus,  the  weighted  normal  equations  (5.30)  reduce  to 

^<a  +  5/3=^,  5<a+^/3=^,  so  a  =  1.7817, 

Therefore,  the  least  squares  fit  to  the  data  under  the  given  weights  is 

y  =  1.7817+  1.651U. 


/3  =  1.6511 


Example  5.15.  Suppose  we  are  given  a  sample  of  an  unknown  radioactive  isotope.  At 

several  times  ti:  we  measure  the  amount  mi  of  radioactive  material  remaining  in  the  sample. 
The  problem  is  to  determine  the  initial  amount  of  material  along  with  the  isotope’s  half- 
life.  If  the  measurements  were  exact,  we  would  have  m(t)  —  m0e/3t,  where  m0  =  m(0)  is 
the  initial  mass,  and  /3  <  0  the  decay  rate.  The  half-life  is  given  by  t*  =  /3— 1  log  2;  see 
Example  8.1  for  additional  details. 
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As  it  stands,  this  is  not  a  linear  least  squares  problem.  But  it  can  be  easily  converted 
to  the  proper  form  by  taking  logarithms: 

y(t)  =  log  m(t)  =  logm0  -\-  fit  =  a  -\-  fit  where  a  =  logm0. 

We  can  thus  do  a  linear  least  squares  fit  on  the  logarithms  yi  =  log  mi  of  the  radioactive 
mass  data  at  the  measurement  times  ti  to  determine  the  best  values  for  a  and  fi. 


Exercises 


5.5.1.  Find  the  straight  line  y 


a  fit  that  best  fits  the  following  data  in  the  least  squares 


sense:  (a) 


h 

-2 

0 

1 

3 

(b) 

h 

l 

2 

3 

4 

5 

Vi 

0 

1 

2 

5 

Vi 

l 

0 

-2 

-3 

-3 

h 

-2 

-1 

0 

1 

2 

Vi 

-5 

do 

-2 

0 

co 

5.5.2.  The  proprietor  of  an  internet  travel  company  compiled  the  following  data  relating  the 
annual  profit  of  the  firm  to  its  annual  advertising  expenditure  (both  measured  in  thousands 
of  dollars): 


Annual  advertising 
expenditure 

12 

14 

17 

21 

26 

30 

Annual  profit 

60 

70 

90 

100 

100 

120 

(a)  Determine  the  equation  of  the  least  squares  line,  (b)  Plot  the  data  and  the  least 
squares  line,  (c)  Estimate  the  profit  when  the  annual  advertising  budget  is  $50,000. 

(d)  What  about  a  $100,000  budget? 

5.5.3.  The  median  price  (in  thousands  of  dollars)  of  existing  homes  in  a  certain  metropolitan 
area  from  1989  to  1999  was: 


year 

1989 

1990 

1991 

1992 

1993 

1994 

1995 

1996 

1997 

1998 

1999 

price 

86.4 

89.8 

92.8 

96.0 

99.6 

103.1 

106.3 

109.5 

113.3 

120.0 

129.5 

(a)  Find  an  equation  of  the  least  squares  line  for  these  data,  (b)  Estimate  the  median 
price  of  a  house  in  the  year  2005,  and  the  year  2010,  assuming  that  the  trend  continues. 


5.5.4.  A  20-pound  turkey  that  is  at  the  room  temperature  of  72°  is  placed  in  the  oven  at  1:00 
pm.  The  temperature  of  the  turkey  is  observed  in  20  minute  intervals  to  be  79°,  88°,  and 
96°.  A  turkey  is  cooked  when  its  temperature  reaches  165°.  How  much  longer  do  you  need 
to  wait  until  the  turkey  is  done? 

T  5.5.5.  The  amount  of  waste  (in  millions  of  tons  a  day)  generated  in  a  certain  city  from  1960  to 
1995  was 


year 

1960 

1965 

1970 

1975 

1980 

1985 

1990 

1995 

amount 

86 

99.8 

115.8 

125 

132.6 

143.1 

156.3 

169.5 

(a)  Find  the  equation  for  the  least  squares  line  that  best  fits  these  data. 

(b)  Use  the  result  to  estimate  the  amount  of  waste  in  the  year  2000,  and  in  the  year  2005. 

(c)  Redo  your  calculations  using  an  exponential  growth  model  y  =  ceat . 

(d)  Which  model  do  you  think  most  accurately  reflects  the  data?  Why? 
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5.5.6.  The  amount  of  radium-224  in  a  sample  was  measured  at  the  indicated  times. 


time  in  days 

0 

1 

2 

3 

4 

5 

6 

7 

mg 

100 

82.7 

68.3 

56.5 

46.7 

38.6 

31.9 

26.4 

(a)  Estimate  how  much  radium  will  be  left  after  10  days. 

(b)  If  the  sample  is  considered  to  be  safe  when  the  amount  of  radium  is  less  than  .01  mg, 
estimate  how  long  the  sample  needs  to  be  stored  before  it  can  be  safely  disposed  of. 

5.5.7.  The  following  table  gives  the  population  of  the  United  States  for  the  years  1900-2000. 


year 

1900 

1920 

1940 

1960 

1980 

2000 

population  —  in  millions 

76 

106 

132 

181 

227 

282 

(a)  Use  an  exponential  growth  model  of  the  form  y  =  ceat  to  predict  the  population 
in  2020,  2050,  and  3000.  (b)  The  actual  population  for  the  year  2020  has  recently  been 
estimated  to  be  334  million.  How  does  this  affect  your  predictions  for  2050  and  3000? 

5.5.8.  Find  the  best  linear  least  squares  fit  of  the  following  data  using  the  indicated  weights: 


h 

1 

2 

3 

4 

h 

0 

1 

3 

6 

Vi 

.2 

.4 

.7 

1.2 

(b) 

Vi 

2 

3 

7 

12 

ci 

1 

2 

3 

4 

ci 

4 

3 

2 

1 

h 

-2 

-1 

0 

1 

2 

h 

1 

2 

3 

4 

5 

(c) 

Vi 

-5 

-3 

-2 

0 

3 

(d) 

Vi 

2 

1.3 

1.1 

.8 

.2 

Ci 

2 

1 

.5 

1 

2 

Ci 

5 

4 

3 

2 

1 

X 

1 

1 

2 

2 

3 

3 

5.5.9. 

For  the  data  points 

y 

1 

2 

1 

2 

2 

4 

z 

3 

6 

11 

-2 

0 

3 

(a)  determine  the  best  plane  z  =  a  +  bx  +  cy  that  best  fits  the  data  in  the  least  squares 
sense;  (b)  how  would  you  answer  the  question  in  part  (a)  if  the  plane  were  constrained  to 
go  through  the  point  x  =  2,  y  =  z  =  07 


5.5.10.  For  the  data  points  in  Exercise  5.5.9,  determine  the  plane  z  =  a  +  (3x  +  7 y  that  fits  the 
data  in  the  least  squares  sense  when  the  errors  are  weighted  according  to  the  reciprocal  of 
the  distance  of  the  point  ( xi,yi,zi )  from  the  origin. 

0  5.5.11.  Show,  by  constructing  explicit  examples,  that  t 2  7^  (7)2  and  ty  7^  ty.  Can  you  find  any 
data  for  which  either  equality  is  valid? 

_  1  m 

<0  5.5.12.  Given  points  ..., tm,  prove  t2  —  (ty  =  fr  £  (ti  —  t)  ,  thereby  justifying  (5.45). 

m  i  =  i 


Polynomial  Approximation  and  Interpolation 

The  basic  least  squares  philosophy  has  a  variety  of  different  extensions,  all  interesting  and 
all  useful.  First,  we  can  replace  the  straight  line  (5.39)  by  a  parabola  defined  by  a  quadratic 
function 

y  —  a  +  fit  +  yt2 .  (5.47) 

For  example,  Newton’s  theory  of  gravitation  says  that  (in  the  absence  of  air  resistance)  a 
falling  object  obeys  the  parabolic  law  (5.47),  where  a  —  h0  is  the  initial  height,  /3  —  v0  is 
the  initial  velocity,  and  7  =  —  is  minus  one  half  the  gravitational  constant.  Suppose 
we  observe  a  falling  body  on  a  new  planet,  and  measure  its  height  yi  at  times  ti.  Then  we 
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Figure  5.6.  Interpolating  Polynomials. 


can  approximate  its  initial  height,  initial  velocity,  and  gravitational  acceleration  by  finding 
the  parabola  (5.47)  that  best  fits  the  data.  Again,  we  characterize  the  least  squares  fit  by 
minimizing  the  sum  of  the  squares  of  the  individual  errors  ei  —  yi  —  yfa). 

The  method  can  evidently  be  extended  to  a  completely  general  polynomial  function 


y(t)  —  Qq  T  oq  t  +  •  •  •  T  OLn  tn 


(5.48) 


of  degree  n.  The  total  squared  error  between  the  data  and  the  sample  values  of  the  function 
is  equal  to  m 

2  =  [.Vi  - y^i)]2  =  lly-^xll2>  (5-49) 


where 


A  = 


i—  1 


( i  k  t\  ...  k\ 


1  U  t; 


fn 

l2 


\  1  tfn  tfn 


o\ 


X  = 


tn  / 
Lm  ' 


a 

OL< 


y  = 


f  Vi\ 

V2 

\  Vm  J 


(5.50) 


The  m  x  (n  +  1)  coefficient  matrix  is  known  as  a  Vandermonde  matrix ,  named  after  the 
eighteenth-century  French  mathematician,  scientist,  and  musician/musicologist  Alexandre- 
Theophile  Vandermonde  —  despite  the  fact  that  it  appears  nowhere  in  his  four  mathemat¬ 
ical  papers!  In  particular,  if  m  =  n  +  1,  then  A  is  square,  and  so,  assuming  invertibility, 
we  can  solve  Ax  =  y  exactly.  In  other  words,  there  is  no  error,  and  the  solution  is  an 
interpolating  polynomial ,  meaning  that  it  fits  the  data  exactly.  A  proof  of  the  following 
result  can  be  found  at  the  end  of  this  section. 


Lemma  5.16.  If  £1? . . . ,  tn+i  ^  ^  are  distinct,  so  ti  ^  t-  for  i  ^  j ,  then  the  (n  +  1)  x  (n+1) 
Vandermonde  interpolation  matrix  (5.50)  is  nonsingular. 


This  result  implies  the  basic  existence  theorem  for  interpolating  polynomials. 

Theorem  5.17.  Let  t1: . . . ,  tn+i  be  distinct  sample  points.  Then,  for  any  prescribed  data 
2/i , . . . ,  2/n+i,  there  exists  a  unique  interpolating  polynomial  y(t)  of  degree  <  n  that  has 
the  prescribed  sample  values:  yfa)  =  yi  for  alH  =  1, . . . ,  n  +  1. 

Thus,  in  particular,  two  points  will  determine  a  unique  interpolating  line,  three  points 
a  unique  interpolating  parabola,  four  points  an  interpolating  cubic,  and  so  on,  as  sketched 
in  Figure  5.6. 

The  basic  ideas  of  interpolation  and  least  squares  fitting  of  data  can  be  applied  to 
approximate  complicated  mathematical  functions  by  much  simpler  polynomials.  Such  ap¬ 
proximation  schemes  are  used  in  all  numerical  computations.  Your  computer  or  calculator 


5.5  Data  Fitting  and  Interpolation 


261 


is  only  able  to  add,  subtract,  multiply,  and  divide.  Thus,  when  you  ask  it  to  compute  \ft 
or  et  or  cos  t  or  any  other  non-rational  function,  the  program  must  rely  on  an  approxima¬ 
tion  scheme  based  on  polynomials^.  In  the  “dark  ages”  before  electronic  computers^,  one 
would  consult  precomputed  tables  of  values  of  the  function  at  particular  data  points.  If 
one  needed  a  value  at  a  non-tabulated  point,  then  some  form  of  polynomial  interpolation 
would  be  used  to  approximate  the  intermediate  value. 

Example  5.18.  Suppose  that  we  would  like  to  compute  reasonably  accurate  values  for 

the  exponential  function  et  for  values  of  t  lying  in  the  interval  0  <  t  <  1  by  approximating 
it  by  a  quadratic  polynomial 

p(t)  —  a  +  /3t  +  yt2.  (5.51) 


If  we  choose  3  points,  say  t1  =  0,  t2  =  -5,  t3  —  1,  then  there  is  a  unique  quadratic 
polynomial  (5.51)  that  interpolates  el  at  the  data  points,  i.e.,  p(£j  =  eti  for  i  —  1,2,3.  In 

this  case,  the  coefficient  matrix  (5.50),  namely  A  = 
can  exactly  solve  the  interpolation  equations 


10  0  \ 

I  .5  .25  ,  is  nonsingular,  so  we 

II  1  / 


Ax 


where 


i.  \ 

1.64872 
2.71828  / 


is  the  data  vector,  which  we  assume  we  already  know.  The  solution 


x  = 


p 


\  7  / 

yields  the  interpolating  polynomial 


i.  \ 
.876603 
.841679  / 


p(t)  =  1  +  .876603 1  +  .841679 12. 


(5.52) 


It  is  the  unique  quadratic  polynomial  that  agrees  with  e*  at  the  three  specified  data  points. 
See  Figure  5.7  for  a  comparison  of  the  graphs;  the  first  graph  shows  e*,  the  second  pit). 
and  the  third  lays  the  two  graphs  on  top  of  each  other.  Even  with  such  a  primitive 
interpolation  scheme,  the  two  functions  are  quite  close.  The  maximum  error,  or  L°°  norm, 
of  the  difference  is 


II  et  ~  p{t)  lloo  =  max  {  |  el  —  p(t) 
with  the  largest  deviation  occurring  at  t  ~  .796. 


0  <  t  <  1  }  «  .01442, 


There  is,  in  fact,  an  explicit  formula  for  the  interpolating  polynomial  that  is  named  after 
the  influential  eighteenth  century  Italian-French  mathematician  Joseph-Louis  Lagrange.  It 
relies  on  the  basic  superposition  principle  for  solving  inhomogeneous  systems  that  we  found 
in  Theorem  2.44.  Specifically,  suppose  we  know  the  solutions  x1? . . . ,  xn+1  to  the  particular 
interpolation  systems 

Axfc  =  efc,  k  =  1, . . .  ,n  +  1,  (5.53) 


^  Actually,  since  division  also  is  possible,  one  could  also  allow  interpolation  and  approximation 
by  rational  functions,  a  subject  known  as  Pade  approximation  theory ,  [3], 

^  Back  then,  the  word  “computer”  referred  to  a  human  who  computed,  mostly  female  and 
including  the  first  author’s  mother,  Grace  Olver. 
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Quadratic  Interpolating  Polynomial  for  el . 


where  e1? . . . ,  en+1  are  the  standard  basis  vectors  of  Mn+1.  Then  the  solution  to 

Ax  =  y  =  |/1e1+  •••  +y„+1en+1 
is  given  by  the  superposition  formula 

x  =  2/l  xl  +  '*•  +2/n+lXn+r 

The  particular  interpolation  equation  (5.53)  corresponds  to  the  interpolation  data  y  =  efc, 
meaning  that  yk  —  1,  while  yi  —  0  at  all  points  ti  with  i  k.  If  we  can  hnd  the 
n  +  1  particular  interpolating  polynomials  that  realize  this  very  special  data,  we  can  use 
superposition  to  construct  the  general  interpolating  polynomial. 


Theorem  5.19.  Given  distinct  sample  points  £1? . . . ,  £n+1,  the  kth  Lagrange  interpolating 
polynomial  is  given  by 


\tk~t l)  ’  ’  ’  ( tk  ~  tk-1 


)(*-*fc+l)  •••  (*-*r.+l) 

)(^k~^k+ 1)  \^k~  C+l) 


It  is  the  unique  polynomial  of  degree  n  that  satisfies 


k  =  1, . . . ,  n  +  1. 

(5.54) 


i,k  —  1, . . . ,  n  +  1. 


(5.55) 


Proof :  The  uniqueness  of  the  Lagrange  interpolating  polynomial  is  an  immediate  conse¬ 
quence  of  Theorem  5.17.  To  show  that  (5.54)  is  the  correct  formula,  we  note  that  when 
t  —  ti  for  any  i  ^  k,  the  factor  (t  —  tf)  in  the  numerator  of  Lk(t)  vanishes,  while  the 
denominator  is  not  zero,  since  the  points  are  distinct:  ti  ^  tk  for  i  ^  k.  On  the  other  hand, 
when  t  =  tk,  the  numerator  and  denominator  are  equal,  and  so  Lk(tk )  =  1.  Q.E.D. 

Theorem  5.20.  If  tl5 . . . ,  tn+1  are  distinct,  then  the  polynomial  of  degree  <  n  that  inter¬ 
polates  the  associated  data  y1: . . . ,  yn+i  is 

p(t)  =  VlLl(t)  +  ■■■  +yn+lLn+l(t )•  (5-56) 

Proof :  We  merely  compute 

p(tk)  =  Vl^l^k)  +  +  +  ■“  +  Vn+l^n+lih)  =  2/fcJ 

where,  according  to  (5.55),  every  summand  except  the  kth  is  zero. 


Q.E.D. 
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Figure  5.8.  Lagrange  Interpolating  Polynomials  for  the  Points  0,  .5, 1. 


Example  5.21.  For  example,  the  three  quadratic  Lagrange  interpolating  polynomials 
for  the  values  tx  —  0,  t2  =  t3  =  1,  used  to  interpolate  el  in  Example  5.18  are 

1)  „ 

L  (t)  =  - - — - =  2t2  -  U  +  1, 

(O-D(O-l) 

(i  —  0)(t  —  1)  „ 

LJt)  =  — - - - —  =  -4t2  +  4i, 

2  (|-o)(i-i) 

(t  —  o)  ( t  —  4 ) 

L  (t)  = - - - —  =  2t2  -  t. 

Thus,  we  can  rewrite  the  quadratic  interpolant  (5.52)  to  e'  as 


(5.57) 


y{t )  =  Lx(t)  +e1/2L2(t)  +eL3(t) 

=  (2 12  -  3t+  1)  +  1.64872  (-4t2  +  At)  +  2.71828  (2 12  -t). 

We  stress  that  this  is  the  same  interpolating  polynomial  —  we  have  merely  rewritten  it  in 
the  alternative  Lagrange  form. 


You  might  expect  that  the  higher  the  degree,  the  more  accurate  the  interpolating  poly¬ 
nomial.  This  expectation  turns  out,  unfortunately,  not  to  be  uniformly  valid.  While 
low-degree  interpolating  polynomials  are  usually  reasonable  approximants  to  functions, 
not  only  are  high-degree  interpolants  more  expensive  to  compute,  but  they  can  be  rather 
badly  behaved,  particularly  near  the  ends  of  the  interval.  For  example,  Figure  5.9  displays 
the  degree  2,  4,  and  10  interpolating  polynomials  for  the  function  1/(1  -\-t 2)  on  the  interval 
— 3  <  t  <  3  using  equally  spaced  data  points.  Note  the  rather  poor  approximation  of  the 
function  near  the  ends  of  the  interval.  Higher  degree  interpolants  fare  even  worse,  although 
the  bad  behavior  becomes  more  and  more  concentrated  near  the  endpoints.  (Interestingly, 
this  behavior  is  a  consequence  of  the  positions  of  the  complex  singularities  of  the  function 
being  interpolated,  [62].)  As  a  consequence,  high-degree  polynomial  interpolation  tends 
not  to  be  used  in  practical  applications.  Better  alternatives  rely  on  least  squares  approx¬ 
imants  by  low-degree  polynomials,  to  be  described  next,  and  interpolation  by  piecewise 
cubic  splines,  to  be  discussed  at  the  end  of  this  section. 

If  we  have  m  >  n  +  1  data  points,  then,  usually,  there  is  no  degree  n  polynomial  that 
fits  all  the  data,  and  so  we  must  switch  over  to  a  least  squares  approximation.  The  first 
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4  and  10  Interpolating  Polynomials  for  1/(1  +  t2). 


requirement  is  that  the  associated  m  x  (n+  1)  interpolation  matrix  (5.50)  have  rank  n+  1; 
this  follows  from  Lemma  5.16  (coupled  with  Exercise  2.5.43),  provided  that  at  least  n  +  1 
of  the  values  t1: . . . ,  tm  are  distinct.  Thus,  given  data  at  m  >  n  +  1  distinct  sample  points 
t1: . . .  ,tm,  we  can  uniquely  determine  the  best  least  squares  polynomial  of  degree  n  that 
best  fits  the  data  by  solving  the  normal  equations  (5.42). 

Example  5.22.  Let  us  return  to  the  problem  of  approximating  the  exponential  function 

et.  If  we  use  more  than  three  data  points,  but  still  require  a  quadratic  polynomial,  then 
we  can  no  longer  interpolate  exactly,  and  must  devise  a  least  squares  approximant.  For 
instance,  using  five  equally  spaced  sample  points 


tx  =0,  t2  =  .25, 


the  coefficient  matrix  and  sampled  data  vector  (5.50)  are 


(l 

h 

(l 

0 

0  \ 

fetl\ 

( L  \ 

1 

^2 

l2 

1 

.25 

.0625 

et2 

1.28403 

A  = 

1 

h 

t2 

— 

1 

.5 

.25 

,  y  = 

et3 

— 

1.64872 

1 

t4 

t2 

1 

.75 

.5625 

e*4 

2.11700 

\1 

h 

t2J 

\i 

1 

1  / 

\et5  / 

\2. 71828/ 

The  solution  to  the  normal  equations  (5.27),  with 


is 


K  =  AtA  = 


5. 

2.5 

1.875 

2.5 

1.875 

1.5625 

1.875 

1.5625 

1.38281 

x  =  K 

-1f  =  ( 

1.00514,. 

f  =  Atv  = 


VT 


/  8.76803  \ 
5.45140  , 

y  4.40153  ) 


This  leads  to  the  quadratic  least  squares  approximant 


p2(t)  =  1.00514  +  .864277 1  +  .843538 t2. 


On  the  other  hand,  the  quartic  interpolating  polynomial 

p4(t)  =  1  +  .998803 1  +  .509787 12  +  .140276 13  +  .069416  tA 


is  found  directly  from  the  data  values  as  above.  The  quadratic  polynomial  has  a  maximal 
error  of  ~  .011  over  the  interval  [0,1]  —  slightly  better  than  our  previous  quadratic 
interpolant  —  while  the  quartic  has  a  significantly  smaller  maximal  error:  ~  .0000527. 
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Figure  5.10.  Quadratic  Approximant  and  Quartic  Interpolant  for  et. 


(In  this  case,  high-degree  interpolants  are  not  ill  behaved.)  See  Figure  5.10  for  a  comparison 
of  the  graphs,  and  Example  5.24  below  for  further  discussion. 

As  noted  above,  the  required  calculations  can  be  significantly  simplified  by  the  intro¬ 
duction  of  an  orthogonal  basis  of  the  least  squares  subspace.  Let  us  see  how  this  works  in 
the  present  situation.  Given  sample  points  . . . ,  tm,  let 

t  —  ( tk  tk  tk  )T  h  =  n  i  9 

Lk  \  5  u2  5  '  '  '  5  Lm  )  5  ^ 

be  the  vector  obtained  by  sampling  the  monomial  tk .  More  generally,  sampling  a  polyno¬ 
mial,  i.e.,  a  linear  combination  of  monomials 


V  =  p(t)  =  a0  +  Oixt  +  •••  +antn 


(5.58) 


results  in  the  selfsame  linear  combination 

P=  =  O‘oto  +  aiti+  ■■■  (5-59) 


of  monomial  sample  vectors.  Thus,  all  sampled  polynomial  vectors  belong  to  the  subspace 
W  =  span  { t0, . . . ,  tn  }  C  spanned  by  the  monomial  sample  vectors. 

Let  y  =  ( 2/1?  y2,  •  •  • ,  ym  )  be  data  that  has  been  measured  at  the  sample  points.  The 
polynomial  least  squares  approximation  is,  by  definition,  the  polynomial  y  =  p(t)  whose 
sample  vector  p  is  the  closest  point  to  y  lying  in  the  subspace  W,  which,  according  to 
Theorem  5.7,  is  the  same  as  the  orthogonal  projection  of  the  data  vector  y  onto  W. 
But  the  monomial  sample  vectors  t0, . . .  ,tn  are  not  orthogonal,  and  so  a  direct  approach 
requires  solving  the  normal  equations  (5.42)  for  the  least  squares  coefficients  <a0, . . . ,  an. 

An  better  strategy  is  to  first  apply  the  Gram-Schmidt  process  to  construct  an  orthogonal 
basis  for  the  subspace  W,  from  which  the  least  squares  coefficients  are  then  found  by  our 
orthogonal  projection  formula  (4.41).  Let  us  adopt  the  rescaled  version 


(v,  w)  =  — 
m 


rri 

E  viwi  =  v™ 

i  =  1 


(5.60) 


of  the  standard  dot  product ^  on  Mm.  If  v,w  represent  the  sample  vectors  corresponding 
to  the  functions  v(t),w(t),  then  their  inner  product  (v,w)  is  equal  to  the  average  value 


For  weighted  least  squares,  we  would  use  an  appropriately  weighted  inner  product. 
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of  the  product  function  v(t)  w(t)  on  the  m  sample  points.  In  particular,  the  inner  product 
between  our  “monomial”  basis  vectors  corresponding  to  sampling  tk  and  tl  is 

m  1  m 

i  =  1  i  =  1 

which  is  the  averaged  sample  value  of  the  monomial  tk+l . 

To  keep  the  formulas  reasonably  simple,  let  us  further  assume^  that  the  sample  points 
are  evenly  spaced  and  symmetric  about  0.  The  first  requirement  is  that  ti  —  ti_1  —  h  be 
independent  of  i,  while  the  second  means  that  if  ti  is  a  sample  point,  then  so  is  —  ti.  An 
example  is  provided  by  the  seven  sample  points  —3,— 2,— 1,0, 1,2,  3.  As  a  consequence  of 

these  two  assumptions,  the  averaged  sample  values  of  the  odd  powers  of  t  vanish:  t2lJrl  =  0. 
Hence,  by  (5.61),  the  sample  vectors  tk  and  tz  are  orthogonal  whenever  k  +  l  is  odd. 

Applying  the  Gram-Schmidt  algorithm  to  t0,t1,t2, . . .  produces  the  orthogonal  basis 
vectors  q0,  q1?  q2, . . .  .  Each 

ctk  =  (Qk(ti)>--->Qk(tm))T  =  ckoto  +  ckiti  +  •••  +Ckktk  (5.62) 

can  be  interpreted  as  the  sample  vector  for  a  certain  degree  k  interpolating  polynomial 

Qk(t)  =  cko  +  ckit  +  "•  +  ckktk  • 


Under  these  assumptions,  the  first  few  of  these  polynomials,  along  with  their  corresponding 
orthogonal  sample  vectors,  are  as  follows: 


Qo(t)  =  1, 

<h(t)  =  t, 

Qtii1)  =t2  -t2 , 

%{t)  =  t3  ~  = 

tZ 


q0  —  V. 

qi  =  ti. 

q2  =  t2  -  e  t0, 


t4 


q0ll2  =  i, 

qj2  =  t2, 


q3ll2  =  *6- 


(*4)2 


t 2 


(5.63) 


With  these  in  hand,  the  least  squares  approximating  polynomial  of  degree  n  to  the  given 
data  vector  y  is  given  by  a  linear  combination 


P(t)  =  a0q0{t)+a1q1(t)+a2q2(t)+  ■■■  +anqn(t).  (5.64) 


The  coefficients  can  now  be  obtained  directly  through  the  orthogonality  formula  (4.42), 
and  so 


=  ( qfc  >  y ) 

II  qj2 


(5.65) 


Thus,  once  we  have  set  up  the  orthogonal  basis,  we  no  longer  need  to  solve  any  linear 
system  to  construct  the  least  squares  approximation. 

An  additional  advantage  of  the  orthogonal  basis  is  that,  in  contrast  to  the  direct  method, 
the  formulas  (5.65)  for  the  least  squares  coefficients  do  not  depend  on  the  degree  of  the  ap¬ 
proximating  polynomial.  As  a  result,  one  can  readily  increase  the  degree,  and,  in  favorable 


^  The  method  works  without  this  restriction,  but  the  formulas  become  more  unwieldy.  See 
Exercise  5.5.31  for  details. 
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Cubic 


Figure  5.11.  Least  Squares  Data  Approximations. 


situations,  the  accuracy  of  the  approximant  without  having  to  recompute  any  of  the  lower- 
degree  terms.  For  instance,  if  a  quadratic  polynomial  p2{t)  =  u0  +  a1q1(t)  +  a2q2(t)  is 
insufficiently  accurate,  the  cubic  least  squares  approximant  p3(t)  =  p2(t)  +  a3q3(t)  can  be 
constructed  without  having  to  recompute  the  quadratic  coefficients  a0,  a1?  a2.  This  doesn’t 
work  when  using  the  non-orthogonal  monomials,  all  of  whose  coefficients  will  be  affected 
by  increasing  the  degree  of  the  approximating  polynomial. 

Example  5.23.  Consider  the  following  tabulated  sample  values: 


ti 

-3 

-2 

-1 

0 

1 

2 

3 

Vi 

-1.4 

-1.3 

-.6 

.1 

.9 

1.8 

2.9 

To  compute  polynomial  least  squares  fits  of  degrees  1,  2  and  3,  we  begin  by  computing  the 
polynomials  (5.63),  which  for  the  given  sample  points  ti  are 

%{t)  =  C  <h (t)=t,  q2(t)  =  t2-  4,  q3(t)  =t3  -7t, 

||q0||2  =  l,  ||qi||2  =  4,  ||q2||2  =  12,  ||q3||2  =  ^. 

Thus,  to  four  decimal  places,  the  coefficients  (5.64)  for  the  least  squares  approximation  are 

%  =  ( <io  >  y )  =  -3429>  «i  =  \  ( qi ,  y )  =  -7357, 

a2  =  ti  ( ^2  >  y )  =  -0738,  a3  =  216  ( q3  ,  y }  =  -  .0083. 

To  obtain  the  best  linear  approximation,  we  use 

p1  (t)  —  a0  q0  (t)  +  al  qx  (t)  —  .3429  +  .7357 1, 

with  a  least  squares  error  of  .7081.  Similarly,  the  quadratic  and  cubic  least  squares  ap¬ 
proximations  are 

p2(t)  =  .3429  +  .7357 1  +  .0738  ( t 2  -  4), 

p3(t)  =  .3429  +  .7357 1  +  .0738  ( t 2  -  4)  -  .0083  (t3  -  7t), 

with  respective  least  squares  errors  .2093  and  .1697  at  the  sample  points.  Observe  that, 
as  noted  above,  the  lower  order  coefficients  do  not  change  as  we  increase  the  degree  of  the 
approximating  polynomial.  A  plot  of  the  first  three  approximations  appears  in  Figure  5.11. 
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The  small  cubic  term  does  not  significantly  increase  the  accuracy  of  the  approximation, 
and  so  this  data  probably  comes  from  sampling  a  quadratic  function. 

Proof  of  Lemma  5.16 :  We  will  establish  the  rather  striking  LU  factorization  of  the  trans¬ 
posed  Vandermonde  matrix  V  =  AT,  which  will  immediately  prove  that,  when  t1, . . . ,  £n+1 
are  distinct,  both  V  and  A  are  nonsingular  matrices.  The  4x4  case  is  instructive  for 
understanding  the  general  pattern.  Applying  regular  Gaussian  Elimination,  we  obtain  the 
explicit  LU  factorization 


1 

1 

1 

v 

( i 

0 

0 

°\ 

h 

*2 

h 

4 

4 

1 

0 

0 

ti 

t2 

z2 

t2 

z3 

t2 

z4 

ti 

ti  + 12 

1 

0 

\tf 

t3 

b2 

t3 

z3 

t\) 

\tf 

t\  +  tx  t2  +  t\ 

1-1+  I2  +  t3 

1/ 

/i 

0 


1 


\ 


t. 


0 

\0 


0 

0 


(t 


'3  '-1 

4)  (4 

0 


tA  —  t 


t2) 


to  t  L 

(4  ~4)(4  “4) 
(4-4)(4-4)(4-4P 

Observe  that  the  pivots,  i.e.,  the  diagonal  entries  of  V,  are  all  nonzero  if  the  points 
are  distinct.  The  reader  may  be  able  to  spot  the  pattern  in  the  above  formula 
and  thus  guess  the  general  case.  Indeed,  the  individual  entries  of  the  matrices  appearing 
in  the  factorization 

V  =  LU 

of  the  (n  +  1)  x  (n  +  1)  Vandermonde  matrix  are 

h3  =  1 


(5.66) 


77 .  .  —  if-  ^ 
UlJ  47 


^ ij 


E 

l<ki<---<ki-j  <j 


tk1  tk2 


t 


ki  —  7  ^ 


, n  +  1, 

£..  =  1 


uij  =  n  (t*  “  4)»  1  <  *  <  j  <  n  +  1, 


k=  1 


L  -  o, 
1/- 1  ^  1 . 


Uij  0: 


(5.67) 

i  =  1, . . .  ,n  +  1, 
l<2<j<n  +  l, 
j  =  l,...,n  +  l, 

Full  details  of  the  proof  that  (5.67)  holds  can  be  found  in  [30,63].  (Surprisingly,  as  far 
as  we  know,  these  are  the  first  places  this  factorization  appears  in  the  literature.)  The 
entries  of  L  lying  below  the  diagonal  are  known  as  the  complete  monomial  polynomials 
since  £-  is  obtained  by  summing,  with  unit  coefficients,  all  monomials  of  degree  i  —  j  in 
the  j  variables  . . . ,  tj.  The  entries  of  U  appearing  on  or  above  the  diagonal  are  known 
as  the  Newton  difference  polynomials .  In  particular,  if  t1: . . . ,  tn  are  distinct,  so  ti  t-  for 
i  j,  then  all  entries  of  U  lying  on  or  above  the  diagonal  are  nonzero.  In  this  case,  V  has 
all  nonzero  pivots,  namely,  the  diagonal  entries  of  U ,  and  is  a  regular,  hence  nonsingular 
matrix.  Q.E.D. 


Exercises 

5.5.13.  Find  and  graph  the  polynomial  of  minimal  degree  that  passes  through  the  following 
points:  (a)  (3,-1),  (6,5);  (b)  (-2,4),  (0,6),  (1,10);  (c)  (-2,3),  (0,-1),  (1,-3); 

(d)  (-1,2),  (0,-1),  (1,0),  (2,-1);  (e)  (-2,17),  (-1,-3),  (0,-3),  (1,-1),  (2,9). 
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5.5.14.  For  the  following  data  values,  construct  the  interpolating  polynomial  in  Lagrange  form: 

0  1  3 


(a) 


(d) 


h 

-3 

2 

(b) 

h 

1 

5 

Vi 

h 

0 

1 

2 

3 

Vi 

0 

1 

4 

9 

Given 

h 

1 

2 

3 

Vi 

3 

6 

11 

.5 


.25 


(c) 


h 

-1 

0 

1 

Vi 

1 

2 

-1 

(e) 


h 

-2 

-1 

0 

1 

2 

Vi 

-1 

-2 

2 

1 

co 

(a)  find  the  straight  line  y  =  a  +  fit  that  best  fits  the 


r\ 

data  in  the  least  squares  sense;  (b)  find  the  parabola  y  =  a  -\-  (3 1  -\-  that  best  fits  the 
data.  Interpret  the  error. 

5.5.16.  Re-solve  Exercise  5.5.15  using  the  respective  weights  2, 1,  .5  at  the  three  data  points. 


5.5.17.  The  table 


time  in  sec 

0 

10 

20 

30 

meters 

4500 

4300 

3930 

3000 

measures  the  altitude 


of  a  falling  parachutist  before  her  chute  has  opened.  Predict  how  many  seconds  she  can 
wait  before  reaching  the  minimum  altitude  of  1500  meters. 


5.5.18.  A  missile  is  launched  in  your  direction.  Using  a  range  finder,  you  measure  its  altitude 

,  .  time  in  sec  0  10  20  30  40  50 

at  the  times: - 

altitude  in  meters  200  650  970  1200  1375  1130 

How  long  until  you  have  to  run? 


5.5.19.  A  student  runs  an  experiment  six  times  in  an  attempt  to  obtain  an  equation  relating 
two  physical  quantities  x  and  y.  For  x  =  1,  2, 4,  6,  8, 10  units,  the  experiments  result  in 
corresponding  y  values  of  3,  3, 4,  6,  7,  8  units.  Find  and  graph  the  following:  (a)  the  least 
squares  line;  (b)  the  least  squares  quadratic  polynomial;  (c)  the  interpolating  polynomial, 
(d)  Which  do  you  think  is  the  most  likely  theoretical  model  for  this  data? 

5.5.20.  (a)  Write  down  the  Taylor  polynomials  of  degrees  2  and  4  at  t  =  0  for  the  function 

f(t)  =  et .  (b)  Compare  their  accuracy  with  the  interpolating  and  least  squares 

polynomials  in  Examples  5.18  and  5.22. 

C  5.5.21.  Given  the  values  of  sint  at  t  =  0°,  30°,  45°,  60°,  find  the  following  approximations: 

(a)  the  least  squares  linear  polynomial;  (b)  the  least  squares  quadratic  polynomial;  (c)  the 
quadratic  Taylor  polynomial  at  t  =  0;  (d)  the  interpolating  polynomial;  (e)  the  cubic 
Taylor  polynomial  at  t  =  0;  (f)  Graph  each  approximation  and  discuss  its  accuracy. 

5.5.22.  Find  the  quartic  (degree  4)  polynomial  that  exactly  interpolates  the  function  tant  at 

the  five  data  points  =0,  t1  =  .25,  t2  =  .5,  =  .75,  t4  =  1.  Compare  the  graphs  of  the 

two  functions  over  0  <  t  <  \  7T. 

5.5.23.  (a)  Find  the  least  squares  linear  polynomial  approximating  \fi  on  [0, 1],  choosing  six 
different  exact  data  values,  (b)  How  much  more  accurate  is  the  least  squares  quadratic 
polynomial  based  on  the  same  data? 

5.5.24.  A  table  of  logarithms  contains  the  following  entries: 


t 

1.0 

2.0 

3.0 

4.0 

log  io  1 

0 

.3010 

.4771 

.6021 

Approximate  log10  e  by  constructing  an  interpolating  polynomial  of  (a)  degree  two  using 
the  entries  at  x  =  1.0,  2.0,  and  3.0,  (b)  degree  three  using  all  the  entries. 
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0  5.5.25.  Let  q(t)  denote  the  quadratic  interpolating  polynomial  that  goes  through  the  data 

points  (tQ,y 0),  (t2 , 2/2 ) *  (a)  Under  what  conditions  does  q(t)  have  a  minimum?  A 

maximum?  (b)  Show  that  the  minimizing/maximizing  value  is  at  t*  = 

2/l  “  %  „  _V2-V\  _  _  +  h  _  _  h  +  *2 


m0  s1  —  s() 

si  —  so 


where  s(J  = 


tQ 


t2  tx 


m 


0 


m 


.  (c)  What  is  g(U)? 


2  1  2 

5.5.26.  Use  the  orthogonal  sample  vectors  (5.63)  to  find  the  best  polynomial  least  squares  fits 
of  degree  1,  2  and  3  for  the  following  sets  of  data: 


(a) 

(b) 

(c) 


h 

-2 

-1 

0 

1 

2 

Vi 

7 

11 

13 

18 

21 

h 

-3 

-2 

-1 

0 

1 

2 

3 

Vi 

-2.7 

-2.1 

-.5 

.5 

1.2 

2.4 

3.2 

h 

-3 

-2 

-1 

0 

1 

2 

3 

Vi 

60 

80 

90 

100 

120 

120 

130 

0  5.5.27.  (a)  Verify  the  orthogonality  of  the  sample  polynomial  vectors  in  (5.63).  (b)  Construct 
the  next  orthogonal  sample  polynomial  q^(t)  and  the  norm  of  its  sample  vector. 

(c)  Use  your  result  to  compute  the  quartic  least  squares  approximation  for  the  data 
in  Example  5.23. 


5.5.28.  Use  the  result  of  Exercise  5.5.27  to  find  the  best  approximating  polynomial  of  degree  4 
to  the  data  in  Exercise  5.5.26. 


5.5.29.  Justify  the  fact  that  the  orthogonal  sample  vector  qfc  in  (5.62)  is  a  linear  combination 
of  only  the  first  k  monomial  sample  vectors. 


U  5.5.30.  The  formulas  (5.63)  apply  only  when  the  sample  times  are  symmetric  around  0.  When 
the  sample  points  . . . ,  tn  are  equally  spaced,  so  ti+ 1  =  h  for  all  i  =  1 , . . . ,  n  —  1 , 

then  there  is  a  simple  trick  to  convert  the  least  squares  problem  into  a  symmetric  form. 

(a)  Show  that  the  translated  sample  points  s  •  =  t-  —  7,  where  t  =  —  WU 1  t-  is  the  average, 

are  symmetric  around  0.  (b)  Suppose  q(s)  is  the  least  squares  polynomial  for  the  data 

points  (si,yi).  Prove  that  p(t)  =  q(t  —  t)  is  the  least  squares  polynomial  for  the  original 
data  (ti,yi).  (c)  Apply  this  method  to  find  the  least  squares  polynomials  of  degrees  1  and 


2  for  the  following  data: 


h 

1 

2 

co 

4 

5 

6 

Vi 

oc 

-6 

-4 

-1 

1 

co 

0  5.5.31.  Construct  the  first  three  orthogonal  basis  elements  for  sample  points  . . .  ,tm  that  are 
in  general  position. 

5.5.32.  Use  n  +  1  equally  spaced  data  points  to  interpolate  f(t)  =  1/(1  +  t  )  on  an  interval 
—  a  <  t  <  a  for  a  =  1, 1.5,  2,  2.5,  3  and  n  =  2, 4, 10,  20.  Do  all  intervals  exhibit  the  pathology 
illustrated  in  Figure  5.9?  If  not,  how  large  can  a  be  before  the  inter polants  have  poor 
approximation  properties?  What  happens  when  the  number  of  interpolation  points  is  taken 
to  be  n  =  50? 

4b  5.5.33.  Repeat  Exercise  5.5.32  for  the  hyperbolic  secant  function  f(t)  =  secht  =  1/  cosht. 

5.5.34.  Given  A  as  in  (5.50)  with  m  <  n  +  1,  how  would  you  characterize  those  polynomials 
p(t)  whose  coefficient  vector  x  lies  in  ker  A? 

5.5.35.  (a)  Give  an  example  of  an  interpolating  polynomial  through  n  +  1  points  that  has  degree 
<  n.  (b)  Can  you  explain,  without  referring  to  the  explicit  formulas,  why  all  the  Lagrange 
interpolating  polynomials  based  on  n  +  1  points  must  have  degree  equal  to  n? 
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5.5.36.  Let  xl5 . . . ,  xn  be  distinct  real  numbers.  Prove  that  the  n  x  n  matrix  K  with  entries 

1  -  (x-x  -)n 

k •  •  =  - - —  is  positive  definite. 

1  _  nr  .  nr  . 

0  5.5.37.  Prove  the  determinant  formula  det  A  =  F[i<z<j<n+l  —  tj)  for  the  (n  +  1)  x  (n  +  1) 
Vandermonde  matrix  defined  in  (5.50). 

0  5.5.38.  (a)  Prove  that  a  polynomial  p(x)  =  a0  +  a1  x  +  a2  x2  +  •  •  •  +  an  xn  of  degree  <  n  vanishes 
at  n  +  1  distinct  points,  so  p{xf)  =  p(x2)  =  •  •  •  =  p(xn+1)  =  0,  if  and  only  if  p(x)  =  0  is  the 
zero  polynomial,  (b)  Prove  that  the  monomials  l,x,x2, . . .  ,xn  are  linearly  independent. 

(c)  Explain  why  p(x)  =  0  if  and  only  if  all  its  coefficients  a0  =  a1  =  •  •  •  =  a  =  0. 

Hint :  Use  Lemma  5.16  and  Exercise  2.3.37. 


C  5.5.39.  Numerical  differentiation :  The  most  common  numerical  methods  for  approximating 

the  derivatives  of  a  function  are  based  on  interpolation.  To  approximate  the  /cth  derivative 

f{k\x0)  at  a  point  x0,  one  replaces  the  function  /(x)  by  an  interpolating  polynomial  pn{x) 
of  degree  n  >  k  based  on  the  nearby  points  x0, . . . ,  xn  (the  point  x0  is  almost  always 

included  as  an  interpolation  point),  leading  to  the  approximation  /^(x0  )  ~  Pn\x o)' 

Use  this  method  to  construct  numerical  approximations  to  (a)  f'(x )  using  a  quadratic 

interpolating  polynomial  based  on  x  —  h,  x,  x  +  h.  (b)  f"(x)  with  the  same  quadratic 
polynomial,  (c)  f'(x)  using  a  quadratic  interpolating  polynomial  based  on  x,  x  +  h,  x  +  2  h. 
(d)  /r(x),  f"(x ),  f"'(x)  and  f^lv\x)  using  a  quartic  interpolating  polynomial  based  on 
x  —  2/i,  x  —  b,  x,  x  +  b,  x  +  2h.  (e)  Test  your  methods  by  approximating  the  derivatives  of 
ex  and  tan  x  at  x  =  0  with  step  sizes  h  =  jq,  ,  1QqQQ  .  Discuss  the  accuracies  you 

observe.  Can  the  step  size  be  arbitrarily  small?  (f)  Why  do  you  need  n  >  k? 


T  5.5.40.  Numerical  integration :  Most  numerical  methods  for  evaluating  a  definite  integral 

rb 

/  f(x)dx  are  based  on  interpolation.  One  chooses  n  +  1  interpolation  points  a  <  x0  < 
J  a 

x1  <  •  •  •  <  xn  <  b  and  replaces  the  integrand  by  its  interpolating  polynomial  pn{x)  of 

rb  rb 

degree  n,  leading  to  the  approximation  /  f(pc)dx  ~  /  pn(x)dx,  where  the  polynomial 

integral  can  be  done  explicitly.  Write  down  the  following  popular  integration  rules: 

(a)  Trapezoid  Rule :  x0  =  a,x1  =  b.  (b)  Simpson’s  Rule :  x0  =  a,x1  =  ^(a  +  6),x2  =  5. 

(c)  Simpson’s  |  Rule :  x0  =  a,  Xj^  =  3  (a  +  6),  x2  =  3  (a  +  6),  x3  =  6. 

(d)  Midpoint  Rule :  x0  =  ^(a  +  6).  (e)  Open  Rule :  x0  =  ^(a  +  6),x1  =  |(a  +  6). 

(f)  Test  your  methods  for  accuracy  on  the  following  integrals: 

r  1  _  rTl  re.  /“7r/2  I 

(z)  ex  dx,  (ii)  j  sinxdx,,  (m)  y^  logxdx,  (zx)  y^  yx3  +  ldx. 

Note:  For  more  details  on  numerical  differentiation  and  integration,  you  are  encouraged  to 
consult  a  basic  numerical  analysis  text,  e.g.,  [8], 


Approximation  and  Interpolation  by  General  Functions 

There  is  nothing  special  about  polynomial  functions  in  the  preceding  approximation  and 
interpolation  schemes.  For  example,  suppose  we  are  interested  in  determining  the  best 
trigonometric  approximation 

y  —  a1  cos  t  +  a2  sin  t 

to  a  given  set  of  data.  Again,  the  least  squares  error  takes  the  same  form  as  in  (5.49), 
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namely  ||  e 


y 


where 


cos  t1 
cos  t2 


sin^  \ 
sin  t2 


\  cos  tm  sin£m  / 


Vi  \ 


\  y m  ) 


Thus,  the  columns  of  A  are  the  sampled  values  of  the  functions  cos  t,  sin  t.  The  key  re¬ 
quirement  is  that  the  unspecified  parameters  —  in  this  case  aq,  a2  —  occur  linearly  in  the 
approximating  function.  Thus,  the  most  general  case  is  to  approximate  the  data  (5.38)  by 
a  linear  combination 


2/(0  =  a^i(i)  +at2h2(t)  +  •••  +  anhn(t) 

of  prescribed  functions  h1(x)1 . . . ,  hn(x).  The  total  squared  error  is,  as  always,  given  by 


rri 


Squared  Error  =  [ y%  —  y(t J  ]  —  \\y  —  Ax. 


i—  1 

where  the  sample  matrix  A,  the  vector  of  unknown  coefficients  x,  and  the  data  vector  y 
are 


A  = 


(  V*i) 

Kir) 

... 

(  al\ 

(Vl\ 

Kih) 

to 

to 

•••  Kih) 

,  X  = 

a2 

,  y  = 

V2 

\hl  (*m) 

^2  (J'm) 

•  •  *  ^ 

\anj 

^  Urn  j 

(5.68) 


If  A  is  square  and  nonsingular,  then  we  can  find  an  interpolating  function  of  the  prescribed 
form  by  solving  the  linear  system 

ix  =  y.  (5.69) 

A  particularly  important  case  is  provided  by  the  2  n  +  1  trigonometric  functions 


1, 


COST 


sinx,  cos2x,  sin2x, 


cos  nx 


sin  nx. 


Interpolation  on  2n  +  1  equally  spaced  data  points  on  the  interval  [ 0,  2 tt ]  leads  to  the 
Discrete  Fourier  Transform,  used  in  signal  processing,  data  transmission,  and  compression, 
and  to  be  the  focus  of  Section  5.6. 

If  there  are  more  than  n  data  points,  then  we  cannot,  in  general,  interpolate  exactly, 
and  must  content  ourselves  with  a  least  squares  approximation  that  minimizes  the  error  at 
the  sample  points  as  best  it  can.  The  least  squares  solution  to  the  interpolation  equations 
(5.69)  is  found  by  solving  the  associated  normal  equations  Kx  =  f,  where  the  (i,j)  entry 
of  K  —  ATA  is  m  times  the  average  sample  value  of  the  product  of  hi(t)  and  h-(t ),  namely 


rri 


Kj  =  m  hi(t)  hj(t )  =  E  hi (V) 


(5.70) 


£=1 


whereas  the  ith  entry  of  f  =  ATy  is 


rri 


fi  =  mhi(t)y=J2  KitdVt 


t=  1 


(5.71) 
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The  one  issue  is  whether  the  columns  of  the  sample  matrix  A  are  linearly  independent, 
which  is  a  more  subtle  issue  than  the  polynomial  case  covered  by  Lemma  5.16.  Linear 
independence  of  the  sampled  function  vectors  is,  in  general,  more  restrictive  than  merely 
requiring  the  functions  themselves  to  be  linearly  independent;  see  Exercise  2.3.37  for  details. 

If  the  parameters  do  not  occur  linearly  in  the  functional  formula,  then  we  cannot  use 
linear  algebra  to  effect  a  least  squares  approximation.  For  example,  one  cannot  use  least 
squares  to  determine  the  frequency  cj,  the  amplitude  r,  and  the  phase  shift  5  of  the  general 
trigonometric  approximation 

y  —  cx  cos uj t  +  c2  sin c at  —  r  cos(u; t  5) 

that  minimizes  the  least  squares  error  at  the  sample  points.  Approximating  data  by  such 
a  function  constitutes  a  nonlinear  optimization  problem. 


Exercises 


5.5.41.  Given  the  values 


h 

0 

.5 

1 

Vi 

1 

.5 

.25 

construct  the  trigonometric  function  of 


the  form  g(t)  =  acos7rt  +  6sin7r£  that  best  approximates  the  data  in  the  least  squares  sense. 

5.5.42.  Find  the  hyperbolic  function  gift)  =  a  cosh  t  +  b  sinh  t  that  best  approximates  the  data  in 

Exercise  5.5.41. 

5.5.43.  (a)  Find  the  exponential  function  of  the  form  g(t)  =  aet  +  be2t  that  best  approximates 

r\ 

t  in  the  least  squares  sense  based  on  the  sample  points  0, 1,  2,  3, 4.  (b)  What  is  the 

least  squares  error?  (c)  Compare  the  graphs  on  the  interval  [0,4]  —  where  is  the 
approximation  the  worst?  (d)  How  much  better  can  you  do  by  including  a  constant  term 
in  g(t)  =  a  e1  +  b  e2t  +  c? 

5.5.44.  (a)  Find  the  best  trigonometric  approximation  of  the  form  gift)  =  r  cos(t  +  5)  to  t  using 
5  and  9  equally  spaced  sample  points  on  [ 0,  tt ] . 

(b)  Can  you  answer  the  question  for  gift)  =  rq  cos(£  +  S-^)  +  r2  cos(2£  +  42)? 

C  5.5.45.  A  trigonometric  polynomial  of  degree  n  is  a  function  of  the  form 

pit)  =  a0  +  ai  cos  t  -\-bl  sin  t  +  a2  cos  2 1  +  b2  sin  2 1  +  •  •  •  +  an  cos  n  t  +  bn  sin  n £, 
where  a0,  a1,  b1, . . . ,  an,  bn  are  the  coefficients.  Find  the  trigonometric  polynomial  of  degree 

r\ 

n  that  is  the  least  squares  approximation  to  the  function  f(t)  =  1/(1  -Ft  )  on  the  interval 

2  7T  7 

[  — 7T,  7r]  based  on  the  k  equally  spaced  data  points  t  •  =  —  tt  H - - — ,  j  =  0, . . . ,  k  —  1, 

J  k 

(omitting  the  right-hand  endpoint),  when  (a)  n  =  1,  k  =  4,  (b)  n  =  2,  k  =  8, 

(c)  n  =  2,  k  =  16,  (d)  n  =  3,  k  =  16.  Compare  the  graphs  of  the  trigonometric 
approximant  and  the  function,  and  discuss,  (e)  Why  do  we  not  include  the  right-hand 
endpoint  tk  =  7r? 

C  5.5.46.  The  sine  functions  are  defined  as  50(x)  =  s^n(7r^/^/)  ^  wpqe  S-{x)  =  Sq(x  —  j  h) 

whenever  h  >  0  and  j  is  an  integer.  We  will  interpolate  a  function  f(x)  at  the  mesh 
points  Xj  =  jhj  j  =  0, ...  ,n,  by  a  linear  combination  of  sine  functions:  S(x)  = 

CqSq(x)  +  •  •  •  +  cnSn(x).  What  are  the  coefficients  c-?  Graph  and  discuss  the  accuracy 


of  the  sine  interpolant  for  the  functions  x2  and  ^ 
h  =  .25,  .1,  and  .025. 


x  —  n 


on  the  interval  [0, 1]  using 
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Least  Squares  Approximation  in  Function  Spaces 

So  far,  while  we  have  used  least  squares  minimization  to  interpolate  and  approximate 
known,  complicated  functions  by  simpler  polynomials,  we  have  only  worried  about  the 
errors  committed  at  a  discrete,  preassigned  set  of  sample  points.  A  more  uniform  approach 
would  be  to  take  into  account  the  errors  at  all  points  in  the  interval  of  interest.  This  can 
be  accomplished  by  replacing  the  discrete,  finite-dimensional  vector  space  norm  on  sample 
vectors  by  a  continuous,  infinite-dimensional  function  space  norm  in  order  to  specify  the 
least  squares  error  that  must  be  minimized  over  the  entire  interval. 

More  precisely,  we  let  V  =  C°[a,  b]  denote  the  space  of  continuous  functions  on  the 
bounded  interval  [a,  b]  with  L2  inner  product  and  norm 


rb 

/  rh 

{ f,g)=  fit)  git)  dt, 

J  a 

ll/ll  =  y 

/  fit)2  dt  . 

J  a 

(5.72) 

Let  V^n)  C  C°[a,  b]  denote  the  subspace  consisting  of  all  polynomials  of  degree  <  n.  For 
simplicity,  we  employ  the  standard  monomial  basis  1,  t,  t2, . . . ,  tn.  We  will  be  approximating 
a  general  function  f(t)  E  C°[a,  b]  by  a  polynomial 

p(t)  =  a0  +  OLxt  +  •••  (5.73) 


of  degree  at  most  n.  The  error  function  eft)  —  f(t)—p(t)  measures  the  discrepancy  between 
the  function  and  its  approximating  polynomial  at  each  t.  Instead  of  summing  the  squares 
of  the  errors  at  a  finite  set  of  sample  points,  we  go  to  a  continuous  limit  that  sums  or, 
rather,  integrates  the  squared  errors  of  all  points  in  the  interval.  Thus,  the  approximating 
polynomial  will  be  characterized  as  the  one  that  minimizes  the  total  L2  squared  error : 


Squared  Error 


P~  f  II2  =  [  [ Pit )  -  f{t)]2dt. 

J  a 


(5.74) 


To  solve  the  problem  of  minimizing  the  squared  error,  we  begin  by  substituting  (5.73) 
into  (5.74)  and  expanding,  as  in  (5.20): 


p- f  ii2  = 


n 


Y  au  ) 


i  =  0 


n 


n 


=  ( Yaif~ fit) >  Yaitl- fit) 


i  =  0 
n 


i  =  0 


n 


=  E 

hj  =  ° 


ai  a- 


t\t3)  - 2  Y  ai(t\ fit) )  +  II  fit ) 

i  =  0 


As  a  result,  we  are  required  to  minimize  a  quadratic  function  of  the  standard  form 

xtATx  —  2  xTf  +  c,  (5.75) 

where  x  =  ( <a0,  aq, . . . ,  an  )  is  the  vector  containing  the  unknown  coefficients  in  the 
minimizing  polynomial  (5.73),  while^ 

Kj  =  i t% ,  V  )  =  [  tl+J  dt,  f.  =  (f,f)=t  tlf(t)dt,  (5.76) 

J  a  J  a 


'  Here,  the  indices  i .  j  labeling  the  entries  of  the  (n  +  1)  x  (n  +  1)  matrix  K  and  vectors 
x,f  6  Rn+1  range  from  0  to  n  instead  of  1  to  n  +  1. 
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Figure  5.12.  Quadratic  Least  Squares  Approximation  of  et. 


are  the  Gram  matrix  K  consisting  of  inner  products  between  basis  monomials  along  with 
the  vector  f  of  inner  products  between  the  monomials  and  the  given  function.  The  coeffi¬ 
cients  of  the  least  squares  minimizing  polynomial  are  thus  found  by  solving  the  associated 
normal  equations  Kx.  —  f. 

Example  5.24.  Let  us  return  to  the  problem  of  approximating  the  exponential  func¬ 
tion  f(t)  =  el  by  a  quadratic  polynomial  on  the  interval  0  <  t  <  1,  but  now  with  the 
least  squares  error  being  measured  by  the  L2  norm.  Thus,  we  consider  the  subspace  V ^ 
consisting  of  all  quadratic  polynomials 

p(t)  =  a  +  /3t  -j-  jt2 . 


Using  the  monomial  basis  1  the  coefficient  matrix  is  the  Gram  matrix  K  consisting 
of  the  inner  products 

1 


(t\tj)  =  [  ti+j  dt  =  - 
Jo  i 


o  *  +  3  +  1 

between  basis  monomials,  while  the  right-hand  side  is  the  vector  of  inner  products 


■i 


f  ,  )  =  I  f  et  dt. 

o 


The  solution  to  the  normal  system 

/ 1 


1 

2 


1 

4 


Vi  i  i / 

\  Q  A  / 


fa\ 

/e  —  1  \ 

13 

— 

1 

V  7/ 

\e-2j 

+  588  ~  .851125, 

is  computed  to  be 

a  =  39e  -  105  -  1.012991,  (3  =  -  216e  +  588  -  .851125,  7  =  210e  -  570  -  .839184, 


leading  to  the  least  squares  quadratic  approximant 

p*(t)  =  1.012991  +  . 851125  t+. 839184 12,  (5.77) 

plotted  in  Figure  5.12.  The  least  squares  error  is 

I  e*  —  p*(t)  || 2  .000027835. 
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The  maximal  error  over  the  interval  is  measured  by  the  L°°  norm  of  the  difference: 


e1  ~  P*(f)  II  oo  =  max  {  I  et—pk{t) 


0  <  t  <  1  }  ~  .014981815, 


with  the  maximum  occurring  at  t  =  1.  Thus,  the  simple  quadratic  polynomial  (5.77) 
will  give  a  reasonable  approximation  to  the  first  two  decimal  places  in  et  on  the  entire 
interval  [0,1].  A  more  accurate  approximation  can  be  constructed  by  taking  a  higher 
degree  polynomial,  or  by  decreasing  the  length  of  the  interval. 


Remark.  Although  the  least  squares  polynomial  (5.77)  minimizes  the  L2  norm  of  the 
error,  it  does  slightly  worse  with  the  L°°  norm  than  the  previous  sample-based  minimizer 
(5.52).  The  problem  of  finding  the  quadratic  polynomial  that  minimizes  the  L°°  norm  is 
more  difficult,  and  must  be  solved  by  nonlinear  minimization  techniques. 

Remark.  As  noted  in  Example  3.39,  the  Gram  matrix  for  the  simple  monomial  basis  is 
the  n  x  n  Hilbert  matrix  (1.72).  The  ill-conditioned  nature  of  the  Hilbert  matrix  and  the 
consequential  difficulty  in  accurately  solving  the  normal  equations  complicate  the  practi¬ 
cal  numerical  implementation  of  high-degree  least  squares  polynomial  approximations.  A 
better  approach,  based  on  an  orthogonal  polynomial  basis,  will  be  discussed  next. 


Exercises 


5.5.47.  Approximate  the  function  fit)  =  \ft  using  the  least  squares  method  based  on  the  L2 
norm  on  the  interval  [0, 1]  by  (a)  a  straight  line;  (b)  a  parabola;  (c)  a  cubic  polynomial. 

5.5.48.  Approximate  the  function  f(t)  =  g  (2 1  —  1)  +  ^  by  a  quadratic  polynomial  on  the 
interval  [  —  1, 1]  using  the  least  squares  method  based  on  the  L2  norm.  Compare  the  graphs. 
Where  is  the  error  the  largest? 


5.5.49.  For  the  function  f(t)  =  sint  determine  the  approximating  linear  and  quadratic 

9  r  i 

polynomials  that  minimize  the  least  squares  error  based  on  the  L  norm  on  0  ,  ^  tt 


5.5.50.  Find  the  quartic  (degree  4)  polynomial  that  best  approximates  the  function  el  on  the 
interval  [0, 1]  by  minimizing  the  L  error  (5.72). 


C  5.5.51.  (a)  Find  the  quadratic  interpolant  to  f(pc)  =  x  on  the  interval  [0, 1]  based  on  equally 
spaced  data  points,  (b)  Find  the  quadratic  least  squares  approximation  based  on  the  data 
points  0,  .25,  .5,  .75, 1.  (c)  Find  the  quadratic  least  squares  approximation  with  respect  to 

the  L  norm,  (d)  Discuss  the  strengths  and  weaknesses  of  each  approximation. 

5.5.52.  Let  f(x)  =  x.  Find  the  trigonometric  function  of  the  form  g(x)  =  a  +  bcosx  +  csinx 


that  minimizes  the  L2  error  \\g  —  f 


0  5.5.53.  Let  g1(t),. . .  ,gn(t )  be  prescribed,  linearly  independent  functions.  Explain  how  to  best 
approximate  a  function  f(t)  by  a  linear  combination  c1  gi(t)  +  •  •  •  +  cn  gn(t)  when  the 

least  squares  error  is  measured  in  a  weighted  L2  norm  ||  / 1|2;  =  /  f(t)2w(t)dt  with  weight 
function  w(t)  >  0.  a 

5.5.54.  (a)  Find  the  quadratic  least  squares  approximation  to  f(t)  =  t  on  the  interval  [0, 1] 
with  weights  ( i )  w(t)  =  1,  (ii)  w(t )  =  t ,  (m)  w(t)  =  e~l .  (b)  Compare  the  errors  —  which 
gives  the  best  result  over  the  entire  interval? 
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(s ?  5.5.55.  Let  fa(x)  = 


a 


1  +  a4x2 
norm  on  (-00,00).  (b)  ||/ 


.  Prove  that  (a)  \\fa 


,  where 


o 

denotes  the  L 


a  moo 


denotes  the  L°°  norm  on  (  —  00,00). 


=  \[7i ,  where 

(c)  Use  this  example  to  explain  why  having  a  small  least  squares  error  does  not  necessarily 
mean  that  the  functions  are  everywhere  close. 

5.5.56.  Find  the  plane  z  =  a  +  fix  +  7 y  that  best  approximates  the  following  functions  on  the 
square  S  =  {0  <  x  <  1,  0  <  ?/  <  1 }  using  the  L2  norm  ||  /  ||2  =  J  J  \  f(x,  y )  |2  dx  dy  to 

OO  q  o 

measure  the  least  squares  error:  (a)  x  +  y  ,  (b)  x  —  y  ,  (c)  sin7rx  sin7r y. 

5.5.57.  Find  the  radial  polynomial  p(x,y)  =  a  +  br  +  cr  ,  where  r  =  x  -\-  y  ,  that  best 

approximates  the  function  f(x,y)  =  x  using  the  L2  norm  on  the  unit  disk  D  =  {r<l}to 
measure  the  least  squares  error. 


Orthogonal  Polynomials  and  Least  Squares 


In  a  similar  fashion,  the  orthogonality  of  Legendre  polynomials  and  their  relatives  serves 
to  simplify  the  construction  of  least  squares  approximants  in  function  space.  Suppose,  for 
instance,  that  our  goal  is  to  approximate  the  exponential  function  el  by  a  polynomial  on 
the  interval  —  1  <  t  <  1,  where  the  least  squares  error  is  measured  using  the  standard  L2 
norm.  We  will  write  the  best  least  squares  approximant  as  a  linear  combination  of  the 
Legendre  polynomials, 

p(t)  =  clq  Pq  (t)  H-  P\  (t)  +  •  •  •  +  cin  Pn  (t)  —  Uq  ~\~  cl^  t  +  0*2  (§  ^2  —  ***  •  (5.78) 


By  orthogonality,  the  least  squares  coefficients  can  be  immediately  computed  by  the  inner 
product  formula  (4.42),  so,  by  the  Rodrigues  formula  (4.61), 


ak  — 


(eCR 


k 


2k+  1 


R 


k 


et  Pk{t)  dt. 


(5.79) 


For  example,  the  quadratic  approximation  is  given  by  the  first  three  terms  in  (5.78),  whose 
coefficients  are 


1 

&n  — 

0  2 


-1 


^dt  =  I  (e-  1  )  ~  1.175201, 


3  r1  3 

Oi  =  -  /  t  ef'  dt  —  -  1.103638, 

1  2  y_i  e 


GLr\  - 

2  2 


J  (|  t2  ~  b  e*  dt  =  ^  fe  —  -i  ~  .357814. 


Graphs  appear  in  Figure  5.13;  the  first  shows  et,  the  second  its  quadratic  approximant 


e*  «  1. 175201  +  1. 103638t  +.357814  (ft2  -  \)  , 


(5.80) 


and  the  third  compares  the  two  by  laying  them  on  top  of  each  other. 

As  in  the  discrete  case,  there  are  two  major  advantages  of  the  orthogonal  Legendre  poly¬ 
nomials  over  the  direct  approach  presented  in  Example  5.24.  First,  we  do  not  need  to  solve 
any  linear  systems  of  equations  since  the  required  coefficients  (5.79)  are  found  by  direct  in¬ 
tegration.  Indeed,  the  coefficient  matrix  for  polynomial  least  squares  approximation  based 
on  the  monomial  basis  is  some  variant  of  the  notoriously  ill-conditioned  Hilbert  matrix, 
(1.72),  and  the  computation  of  an  accurate  solution  can  be  tricky.  Our  precomputation 
of  an  orthogonal  system  of  polynomials  has  successfully  circumvented  the  ill-conditioned 
normal  system. 
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Quadratic  Least  Squares  Approximation  to  el . 


The  second  advantage  is  that  the  coefficients  ak  do  not  depend  on  the  degree  of  the 
approximating  polynomial.  Thus,  if  the  quadratic  approximation  (5.80)  is  insufficiently 
accurate,  we  merely  append  the  cubic  correction  a3P3(t )  whose  coefficient  is  given  by 


-  .070456, 


Unlike  the  earlier  method,  there  is  no  need  to  recompute  the  coefficients  a0,a1,a2,  and 
hence  the  cubic  least  squares  approximant  is 

e*  «  1.175201  +  1.103638 t  +  .357814  (ft2  —  |)  +  .070456  (§  t3  —  ft)  .  (5.81) 

And,  if  we  desire  yet  further  accuracy,  we  need  only  compute  the  next  one  or  two  coeffi¬ 
cients. 

To  exploit  orthogonality,  each  interval  and  norm  requires  the  construction  of  a  corre¬ 
sponding  system  of  orthogonal  polynomials.  Let  us  reconsider  Example  5.24,  in  which 
we  used  the  method  of  least  squares  to  approximate  el  based  on  the  L2  norm  on  [0, 1  . 
Here,  the  ordinary  Legendre  polynomials  are  no  longer  orthogonal,  and  so  we  must  use  the 
rescaled  Legendre  polynomials  (4.74)  instead.  Thus,  the  quadratic  least  squares  approxi¬ 
mant  can  be  written  as 

p(t)  =  a0  +  a1P1  (t)  +  a2 P2 it)  =  1.718282  +  .845155  (2t  —  1)  +  .139864  (6i2  -  6t+  1) 

=  1.012991  +  .851125 1  +  .839184 12, 

where  the  coefficients 


a 


(2/0  +  1)  f  Pk(t )  el  dt 

Jo 


are  found  by  direct  integration: 

a0  —  f  eldt  =  e-\~  1.718282,  ax  =  3  [  (2t  -  1)  e*  dt  =  3(3  -  e)  -  .845155, 

Jo  Jo 

a2  =  5  [  ( 6t 2  -  6t  +  l)et  dt  =  5(7e  -  19)  -  .139864. 

Jo 


It  is  worth  emphasizing  that  this  is  the  same  approximating  polynomial  we  found  ear¬ 
lier  in  (5.77).  The  use  of  an  orthogonal  system  of  polynomials  merely  streamlines  the 
computation. 
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Exercises 


5.5.58.  Use  the  Legendre  polynomials  to  find  the  best  (a)  quadratic,  and  (b)  cubic 
approximation  to  £4,  based  on  the  L2  norm  on  [  —  1,1]. 

5.5.59.  Repeat  Exercise  5.5.58  using  the  L  norm  on  [0,1]. 

5.5.60.  Find  the  best  cubic  approximation  to  f(t)  =  e  based  on  the  L  norm  on  [0,1]. 

5.5.61.  Find  the  (a)  linear,  (b)  quadratic,  and  (c)  cubic  polynomials  q(t)  that  minimize  the 
following  integral:  J  [ q(t )  —  £3]2  dt.  What  is  the  minimum  value  in  each  case? 

5.5.62.  Find  the  best  quadratic  and  cubic  approximations  for  sint  for  the  L2  norm  on  [0,7r]  by 
using  an  orthogonal  basis.  Graph  your  results  and  estimate  the  maximal  error. 

4b  5.5.63.  Answer  Exercise  5.5.60  when  f(t)  =  sint.  Use  a  computer  to  numerically  evaluate  the 
integrals. 

4b  5.5.64.  Find  the  degree  6  least  squares  polynomial  approximation  to  el  on  the  interval  [  —  1, 1] 

o 

under  the  L  norm. 


5.5.65.  (a)  Use  the  polynomials  and  weighted  norm  from  Exercise  4.5.12  to  find  the  quadratic 
least  squares  approximation  to  /(£)  =  1/t.  In  what  sense  is  your  quadratic  approximation 
“best”?  (b)  Now  find  the  best  approximating  cubic  polynomial,  (c)  Compare  the  graphs 
of  the  quadratic  and  cubic  approximants  with  the  original  function  and  discuss  what  you 
observe. 


4b  5.5.66.  Use  the  Laguerre  polynomials  (4.68)  to  find  the  quadratic  and  cubic  polynomial  least 

squares  approximation  to  f(t)  =  tan-1 1  relative  to  the  weighted  inner  product  (4.66).  Use 
a  computer  to  evaluate  the  coefficients.  Graph  your  result  and  discuss  what  you  observe. 


Splines 

In  pre-CAD  (computer  aided  design)  draftsmanship,  a  spline  was  a  long,  thin,  flexible 
strip  of  wood  or  metal  that  was  used  to  draw  a  smooth  curve  through  prescribed  points. 
The  points  were  marked  by  small  pegs,  and  the  spline  rested  on  the  pegs.  The  mathematical 
theory  of  splines  was  first  developed  in  the  1940s  by  the  Romanian- American  mathematician 
Isaac  Schoenberg  as  an  attractive  alternative  to  polynomial  interpolation  and  approximation. 
Splines  have  since  become  ubiquitous  in  numerical  analysis,  in  geometric  modeling,  in 
design  and  manufacturing,  in  computer  graphics  and  animation,  and  in  many  other 
applications. 

We  suppose  that  the  spline  coincides  with  the  graph  of  a  function  y  —  u(x).  The  pegs 
are  fixed  at  the  prescribed  data  points  (x0,  y0), . . . ,  (xn,  yn),  and  this  requires  u{x)  to  satisfy 
the  interpolation  conditions 


u(xj)=Vj>  j  =  0,...,n.  (5.82) 

The  mesh  points  x0  <  x1  <  x2  <  •  •  •  <  xn  are  distinct  and  labeled  in  increasing  order. 
The  spline  is  modeled  as  an  elastic  beam,  and  so  satisfies  the  homogeneous  beam  equation 
u""  —  0,  cf.  [61,  79].  Therefore, 

x-<x<  x  -+1, 

u{x)  =  a  -  +  hj  (x  -  Xj)  +  Cj  (x  -  Xj)2  +  d-  (x  -  x^)3,  (5.83) 

j  =  0,  ...,n-  1, 
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is  a  piecewise  cubic  function  —  meaning  that,  between  successive  mesh  points,  it  is  a  cubic 
polynomial,  but  not  necessarily  the  same  cubic  on  each  subinterval.  The  fact  that  we  write 
the  formula  (5.83)  in  terms  of  x  —  x-  is  merely  for  computational  convenience. 

Our  problem  is  to  determine  the  coefficients 


dp  bj,  Cj,  dp  j  =  0,  ...,n  —  1. 


Since  there  are  n  subintervals,  there  is  a  total  of  4n  coefficients,  and  so  we  require  4n 
equations  to  uniquely  prescribe  them.  First,  we  need  the  spline  to  satisfy  the  interpolation 
conditions  (5.82).  Since  it  is  defined  by  a  different  formula  on  each  side  of  the  mesh  point, 
this  results  in  a  total  of  2n  conditions: 


=  ao  =  yp 


u{xj+ 1)  =  a  -  +  bj  hj  +  Cj  h2  +  d-  h?-  =  yj+ 1? 


j  =  0, . . . ,  n  —  1,  (5.84) 


where  we  abbreviate  the  length  of  the  jth  subinterval  by 


The  next  step  is  to  require  that  the  spline  be  as  smooth  as  possible.  The  interpolation 
conditions  (5.84)  guarantee  that  u(x)  is  continuous.  The  condition  that  u  (x)  E  C1  be 
continuously  differentiable  requires  that  u'(x)  be  continuous  at  the  interior  mesh  points 
^  ^  tZy  ^ ^  which  imposes  the  n  —  1  additional  conditions 

bj  +2  Cj  hj  +  3 dj  h2  =  u\x~+1 )  =  u' (x^+1)  =  6J+1,  j  =  0, . . . ,  n  -  2.  (5.85) 

To  make  w  G  C2,  we  impose  n  —  1  further  conditions 

2  Cj  +  6dj  hj  =  u"(xj+1)  =  un(Xj+1)  =  2  cJ+1,  j  =  0, . . . ,  n  —  2,  (5.86) 

to  ensure  that  u"  is  continuous  at  the  mesh  points.  We  have  now  imposed  a  total  of 
4n  —  2  conditions,  namely  (5.84-86),  on  the  4n  coefficients.  The  two  missing  constraints 
will  come  from  boundary  conditions  at  the  two  endpoints,  namely  x0  and  xn.  There  are 
three  common  types: 

(z)  Natural  boundary  conditions :  u"(x0)  =  u"(xn )  =  0,  whereby 

co  —  0,  c„_i  +3d„_1h„_1  =  0.  (5.87) 

Physically,  this  models  a  simply  supported  spline  that  rests  freely  on  the  first  and  last  pegs. 

(ii)  Clamped  boundary  conditions :  u'(x 0)  =  a,  u\xn)  —  /3,  where  cr,  /3,  which  could  be 
0,  are  specified  by  the  user.  This  requires 

b0  =  a,  bn_l+2cn_lhn_l+Zdn_1h2n_1=  ft.  (5.88) 

This  corresponds  to  clamping  the  spline  at  prescribed  angles  at  each  end. 

(m)  Periodic  boundary  conditions :  u\x 0)  =  u\xn ),  u" (x 0)  =  ^7/(xn),  so  that 

b0  =  bn_1+2cn_1hn_1+3dn_1h2n_1,  c0  =  +  Zd^h^.  (5.89) 

The  periodic  case  is  used  to  draw  smooth  closed  curves;  see  below. 


Theorem  5.25.  Given  mesh  points  a  —  x0  <  x1  <  •  •  •  <  xn  —  5,  and  corresponding  data 
values  t/0,  y±, . . . ,  ynl  along  with  one  of  the  three  kinds  of  boundary  conditions  (5.87),  (5.88), 
or  (5.89),  then  there  exists  a  unique  piecewise  cubic  spline  function  u(x)  E  C2[a,  b]  that 
interpolates  the  data,  u(x0)  =  y0,  ...  ,u(xn)  =  yn,  and  satisfies  the  boundary  conditions. 
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Proof :  We  first  discuss  the  natural  case.  The  clamped  case  is  left  as  an  exercise  for  the 
reader,  while  the  slightly  harder  periodic  case  will  be  treated  at  the  end  of  the  section. 
The  first  set  of  equations  in  (5.84)  says  that 

aj=Vj >  j  =  0, ...  ,n  —  1.  (5.90) 

Next,  (5.86-87)  imply  that 

C  7  J-l  Cj 

J  J  (5.91) 


d‘  3  ft, 


This  equation  also  holds  for  j  —  n  —  1,  provided  that  we  make  the  convention  that^ 

cn  =  0. 

We  now  substitute  (5.90-91)  into  the  second  set  of  equations  in  (5.84),  and  then  solve  the 
resulting  equation  for 

Vj+i  -y±_  (2ci  +  s+W  _  (5  92) 


bi=  h. 


Substituting  this  result  and  (5.91)  back  into  (5.85),  and  simplifying,  we  obtain 


hj  Cj  +  2  (hj  +  hj  + 1)  Cj  + 1  +  hj+ 1  Cj+ 2  -  3 


yj+2  -  yj+ 1  vj+ 1  -  Vj 


y+i 


-  zo+ 1’ 


(5.93) 


where  we  introduce  as  a  shorthand  for  the  quantity  on  the  right-hand  side. 
In  the  case  of  natural  boundary  conditions,  we  have 


Cq  0,  Cn  0, 

and  so  (5.93)  constitutes  a  tridiagonal  linear  system 

4c  =  z, 

for  the  unknown  coefficients  c  =  ( c1?  c2, . . . ,  cn_1  )T,  with  coefficient  matrix 


(5.94) 


^2(^0 +  fti) 

2  (h-^  T  ^2) 


h. 


h2 

2  (h2  +  h3) 


\ 


h. 


h 


n — 3 


2  if^n— 3  ^n— 2)  l0n  —  2 


K 


h 


n— 2 


2(^n-2  +  ^n-l)  ' 


(5.95) 

T 

and  right-hand  side  z  =  ( 2q,  •  •  • ,  )  •  Once  (5.95)  has  been  solved,  we  will  then  use 

(5.90-92)  to  reconstruct  the  other  spline  coefficients 

The  key  observation  is  that  the  coefficient  matrix  A  is  strictly  diagonally  dominant , 
meaning  that  each  diagonal  entry  is  strictly  greater  than  the  sum  of  the  other  entries  in 
its  row: 

2  (hj— i  T  hj)  >  hj_i  +  hj. 

Theorem  8.19  below  implies  that  A  is  nonsingular,  and  hence  the  tridiagonal  linear  system 
has  a  unique  solution  c.  This  suffices  to  prove  the  theorem  in  the  case  of  natural  boundary 
conditions.  Q.E.D. 


^  This  is  merely  for  convenience;  there  is  no  cn  used  in  the  formula  for  the  spline. 
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To  actually  solve  the  linear  system  (5.94),  we  can  apply  our  tridiagonal  solution  algorithm 
(1.68).  Let  us  specialize  to  the  most  important  case,  in  which  the  mesh  points  are  equally 
spaced  in  the  interval  [a,  &],  so  that 

x  ■  =  a  +  j  h,  where  h  —  h  -  —  - ,  j  =  0, . . . ,  n  —  1. 

J  J  n 

In  this  case,  the  coefficient  matrix  A  —  hB  is  equal  to  h  times  the  tridiagonal  matrix 

\ 

1 

4  1 

1  4  1 

1  4  1 

that  first  appeared  in  Example  1.37.  Its  LU  factorization  takes  on  a  rather  simple  form, 
since  most  of  the  entries  of  L  and  U  are  essentially  the  same,  modulo  rounding  error.  This 
makes  the  implementation  of  the  Forward  and  Back  Substitution  procedures  almost  trivial. 

Figure  5.14  shows  a  particular  example  —  a  natural  spline  passing  through  the  data 
points  (0,  0),  (1,  2),  (2,  —1),  (3, 1),  (4,  0).  The  human  eye  is  unable  to  discern  the  discontinuities 
in  its  third  derivatives,  and  so  the  graph  appears  completely  smooth,  even  though  it  is,  in 
fact,  only  C2. 

In  the  periodic  case,  we  set 


/  4 
1 


B  = 


1 

4 

1 


V 


Z  —  y 
n-\-k  n' 


^n+fc  ^ n-\-k 

With  this  convention,  the  basic  equations  (5.90-93)  are  the  same.  In  this  case,  the 
coefficient  matrix  for  the  linear  system 


Ac  =  z . 

is  of  tricirculant  form: 

/2(hn_ i  +  h0)  h0 


T  T 

with  c=  {c0)cl,...,cn_1)  ,  z  =  (z0,z1,...,zn_1)  , 


A  = 


h0  2  (/iq  )  h-^ 

h i  2  (h-^  T  ^2) 


^n  —  1  \ 


h 


'  hn—l 


n — 3 


h. 


2(hn- 3 +  hn_2) 


h 


n  —  2 


h 


n— 2 


2(^n- 2 +  hn_ 1) / 


(5.96) 
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Figure  5.15.  Three  Sample  Spline  Letters. 


Again  A  is  strictly  diagonally  dominant,  and  so  there  is  a  unique  solution  c,  from  which  one 
reconstructs  the  spline,  proving  Theorem  5.25  in  the  periodic  case.  The  LU  factorization 
of  tricirculant  matrices  was  discussed  in  Exercise  1.7.14. 

One  immediate  application  of  splines  is  curve  fitting  in  computer  aided  design  and 

T 

graphics.  The  basic  problem  is  to  draw  a  smooth  parameterized  curve  u (t)  =  (u(t),v(t) ) 
that  passes  through  a  set  of  prescribed  data  points  =  (xk,  yk  )  in  the  plane.  We  have 
the  freedom  to  choose  the  parameter  value  t  —  tk  when  the  curve  passes  through  the  kth 
point;  the  simplest  and  most  common  choice  is  to  set  tk  —  k.  We  then  construct  the 
functions  x  —  u(t )  and  y  —  v(t)  as  cubic  splines  interpolating  the  x  and  y  coordinates  of 
the  data  points,  so  u(tk)  —  xk,  v(tk)  —  yk.  For  smooth  closed  curves,  we  require  that  both 
splines  be  periodic;  for  curves  with  ends,  either  natural  or  clamped  boundary  conditions 
are  used. 


Most  computer  graphics  packages  include  one  or  more  implementations  of  parameterized 
spline  curves.  The  same  idea  also  underlies  modern  font  design  for  laser  printing  and 
typography  (including  the  fonts  used  in  this  book).  The  great  advantage  of  spline  fonts 
over  their  bitmapped  counterparts  is  that  they  can  be  readily  scaled.  Some  sample  letter 
shapes  parameterized  by  periodic  splines  passing  through  the  indicated  data  points  are 
plotted  in  Figure  5.15.  Better  fits  can  be  easily  obtained  by  increasing  the  number  of  data 
points.  Various  extensions  of  the  basic  spline  algorithms  to  space  curves  and  surfaces  are 
an  essential  component  of  modern  computer  graphics,  design,  and  animation,  [25,  74]. 


Exercises 

5.5.67.  Find  and  graph  the  natural  cubic  spline  interpolant  for  the  following  data: 


X 

-l 

0 

1 

(  M 

X 

0  1 

2 

3 

y 

-2 

1 

-1 

(b) 

y 

1  2 

0 

1 

X 

1 

2 

4 

,  x  x 

-2  -1 

0 

1  2 

y 

3 

0 

2 

y 

5  2 

3 

-1  1 

5.5.68.  Repeat  Exercise  5.5.67  when  the  spline  has  homogeneous  clamped  boundary  conditions. 
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5.5.69. 


Find  and  graph  the  periodic  cubic  spline  that  interpolates  the  following  data: 


(a) 

(c) 


X 

0 

l 

2 

3 

(b) 

X 

0 

l 

2 

3 

y 

l 

0 

0 

1 

y 

1 

2 

0 

1 

X 

0 

l 

2 

3 

4 

(d) 

X 

-2 

-1 

0 

1 

2 

y 

l 

0 

0 

0 

1 

y 

1 

2 

-2 

-1 

1 

4b  5.5.70.  (a)  Given  the  known  values  of  sinx  at  x  =  0°,  30°,  45°,  60°,  construct  the  natural  cubic 
spline  interpolant.  (b)  Compare  the  accuracy  of  the  spline  with  the  least  squares  and 
interpolating  polynomials  you  found  in  Exercise  5.5.21. 

4»  5.5.71.  (a)  Using  the  exact  values  for  yfx  at  x  =  0,  1,  construct  the  natural  cubic 

spline  interpolant.  (b)  What  is  the  maximal  error  of  the  spline  on  the  interval  [0, 1]? 

(c)  Compare  the  error  with  that  of  the  interpolating  cubic  polynomial  you  found  in 
Exercise  5.5.23.  Which  is  the  better  approximation?  (d)  Answer  part  (d)  using  the  cubic 

r\ 

least  squares  approximant  based  on  the  L  norm  on  [0,1]. 

4b  5.5.72.  According  to  Figure  5.9,  the  interpolating  polynomials  for  the  function  1/(1  +  x2)  on 
the  interval  [  —  3,  3]  based  on  equally  spaced  mesh  points  are  very  inaccurate  near  the  ends 
of  the  interval.  Does  the  natural  spline  interpolant  based  on  the  same  3,  5,  and  11  data 
points  exhibit  the  same  inaccuracy? 

5.5.73.  (a)  Draw  outlines  of  the  block  capital  letters  I,  C,  S,  and  Y  on  a  sheet  of  graph  paper. 
Fix  several  points  on  the  graphs  and  measure  their  x  and  y  coordinates,  (b)  Use  periodic 
cubic  splines  x  =  u(t),y  =  v(t)  to  interpolate  the  coordinates  of  the  data  points  using 
equally  spaced  nodes  for  the  parameter  values  tk.  Graph  the  resulting  spline  letters,  and 
discuss  how  the  method  could  be  used  in  font  design.  To  get  nicer  results,  you  may  wish  to 
experiment  with  different  numbers  and  locations  for  the  points. 


4»  5.5.74.  Repeat  Exercise  5.5.73,  using  the  Lagrange  interpolating  polynomials  instead  of 

splines  to  parameterize  the  curves.  Compare  the  two  methods  and  discuss  advantages  and 
disadvantages. 

C  5.5.75.  Let  x0  <  x1  <  •  •  •  <  xn.  For  each  j  =  0, . . .  ,n,  the  cardinal  spline  C-(x)  is  defined 
to  be  the  natural  cubic  spline  interpolating  the  Lagrange  data 

y§  0,  tq  0,  ...  yj — i  0,  yj  1,  yj- j-i  ^  ■■■  Vn 

(a)  Construct  and  graph  the  natural  cardinal  splines  corresponding  to  the  nodes  x0  =  0, 

x1  =  1,  x2  =  2,  and  x3  =3.  (b)  Prove  that  the  natural  spline  that  interpolates  the  data 

2/0, . . . ,  yn  can  be  uniquely  written  as  a  linear  combination  u(x)  =  y0C0(x)  +  yl  C1(x)  + 

•  •  •  +  W^n(x)  the  cardinal  splines,  (c)  Explain  why  the  space  of  natural  splines  on 

n  +  1  nodes  is  a  vector  space  of  dimension  n  +  1.  (d)  Discuss  briefly  what  modifications  are 
required  to  adapt  this  method  to  periodic  and  to  clamped  splines. 


C  5.5.76.  A  bell-shaped  or  B -spline  u  =  /3(x)  interpolates  the  data 

p{- 2)  =  o,  /?(— 1)  =  1,  0(0)  =  4,  0(1)  =  1,  0(2)  =  0. 

(a)  Find  the  explicit  formula  for  the  natural  R-spline  and  plot  its  graph,  (b)  Show  that 
fl(x)  also  satisfies  the  homogeneous  clamped  boundary  conditions  u'  (—2)  =  u  (2)  =  0. 

(c)  Show  that  /?(x)  also  satisfies  the  periodic  boundary  conditions.  Thus,  for  this  particular 
interpolation  problem,  the  natural,  clamped,  and  periodic  splines  happen  to  coincide. 


(d)  Show  that  f3*(x) 


0{x), 

0, 


-2  <  x  <  2 
otherwise, 


defines  a  C2  spline  on  every  interval  [  —  k,k]. 


C  5.5.77.  Let  /?(x)  denote  the  R-spline  function  of  Exercise  5.5.76.  Assuming  n  >  4,  let  Pn 
denote  the  vector  space  of  periodic  cubic  splines  based  on  the  integer  nodes  Xj  =  j 
for  j  =  0 (a)  Prove  that  the  B -splines  Bj  (x)  =  f3  (  (x  —  j  —  m)  mod  n  +  m  ) 
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j  =  0, . . . ,  n  —  1,  where  m  denotes  the  integer  part  of  n/2,  form  a  basis  for  Pn.  (b)  Graph 
the  basis  periodic  5-splines  in  the  case  n  =  5.  (c)  Let  u(pc)  denote  the  periodic  spline 
interpolant  for  the  data  values  t/0, . . . ,  yn_  1.  Explain  how  to  write  u(pc)  =  Oq50(x)  + 

•  •  •  +  on_ i  5n_1(x)  in  terms  of  the  5-splines  by  solving  a  linear  system  for  the  coefficients 
a0, . . . ,  an_]_.  (d)  Write  the  periodic  spline  with  y0  =  2/5  —  0,  2/1  =  2,  y2  =  1,  2/3  =  — 1, 

y4  =  —2,  as  a  linear  combination  of  the  periodic  basis  5-splines  50(x), . . .  ,  54(x).  Plot  the 
resulting  periodic  spline  function. 
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In  modern  digital  media  —  audio,  still  images,  video,  etc.  —  continuous  signals  are  sampled 
at  discrete  time  intervals  before  being  processed.  Fourier  analysis  decomposes  the  sampled 
signal  into  its  fundamental  periodic  constituents  —  sines  and  cosines,  or,  more  conveniently, 
complex  exponentials.  The  crucial  fact,  upon  which  all  of  modern  signal  processing  is 
based,  is  that  the  sampled  complex  exponentials  form  an  orthogonal  basis.  This  section 
introduces  the  Discrete  Fourier  Transform,  and  concludes  with  an  introduction  to  the  justly 
famous  Fast  Fourier  Transform,  an  efficient  algorithm  for  computing  the  discrete  Fourier 
representation  and  reconstructing  the  signal  from  its  Fourier  coefficients. 

We  will  concentrate  on  the  one-dimensional  version  here.  Let  f(x)  be  a  function 
representing  the  signal,  defined  on  an  interval  a  <  x  <  b.  Our  computer  can  store  its 
measured  values  only  at  a  finite  number  of  sample  points  a  <  x0  <  x1  <  •  •  •  <  xn  <  b. 
In  the  simplest,  and  by  far  the  most  common,  case,  the  sample  points  are  equally  spaced, 
and  so 

b  —  a 

x-  =  a  +  j  h,  j  —  0, . . . ,  n,  where  h  =  - 

J  n 

indicates  the  sample  rate.  In  signal  processing  applications,  x  represents  time  instead  of 

space,  and  the  x-  are  the  times  at  which  we  sample  the  signal  f(x).  Sample  rates  can  be 

very  high,  e.g.,  every  10-20  milliseconds  in  current  speech  recognition  systems. 

For  simplicity,  we  adopt  the  “standard”  interval  of  0  <  x  <  27r,  and  the  n  equally 
spaced  sample  points^ 


2n  47t 

Xq  0,  x  |  ,  x  2  1 

n  n 


Xn- 1 


2  (n  —  1)  7T 
n 


(5.97) 


(Signals  defined  on  other  intervals  can  be  handled  by  simply  rescaling  the  interval  to  have 
length  2tt.)  Sampling  a  (complex- valued)  signal  or  function  f(x)  produces  the  sample 
vector 


T 


where 


j  =  0, . .  .,n  -  1. 


(5.98) 


Sampling  cannot  distinguish  between  functions  that  have  the  same  values  at  all  of  the 
sample  points  —  from  the  sampler’s  point  of  view  they  are  identical.  For  example,  the 
periodic  complex  exponential  function 


/  (x)  =  einx 


=  cos nx  +  i  sinnx 


t 


We  will  find  it  convenient  to  omit  the  final  sample  point  xn 


=  27 r  from  consideration. 
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Figure  5.16. 


Sampling  e  1X  and  e7lx  on  n  =  8  sample  points. 


has  sampled  values 


exp 


=  e2j7ri 


for  all  j  =  0, . . . ,  n  —  1, 


and  hence  is  indistinguishable  from  the  constant  function  c(x)  =  1  —  both  lead  to  the 
same  sample  vector  ( 1, 1, . . . ,  1  )T.  This  has  the  important  implication  that  sampling  at  n 
equally  spaced  sample  points  cannot  detect  periodic  signals  of  frequency  n.  More  generally, 
the  two  complex  exponential  signals 


i  {k-\-n)x 

L 


and  eikx 


are  also  indistinguishable  when  sampled.  Consequently,  we  need  only  use  the  first  n  periodic 
complex  exponential  functions 

/oW  =  1,  fi(x)  =  eix,  f2(x)  =  e2ix,  ...  fn_1(x)  =  e^ix,  (5.99) 

in  order  to  represent  any  2 7 r  periodic  sampled  signal.  In  particular,  exponentials  e~lkx  of 
“negative”  frequency  can  all  be  converted  into  positive  versions,  namely  e1(n_fc)x,  by  the 
same  sampling  argument.  For  example, 

e~lx  —  cos x  —  i  sin x  and  1  x  =  cos(n  —  1)  x  +  i  sin (n  —  1)  x 

have  identical  values  on  the  sample  points  (5.97).  However,  off  of  the  sample  points,  they 
are  quite  different;  the  former  is  slowly  varying,  while  the  latter  represents  a  high-frequency 
oscillation.  In  Figure  5.16,  we  compare  e~lx  and  e7lx  when  there  are  n  —  8  sample  values, 
indicated  by  the  dots  on  the  graphs.  The  top  row  compares  the  real  parts,  cos  x  and  cos  7  x: 
while  the  bottom  row  compares  the  imaginary  parts,  sinx  and  —  sin7x.  Note  that  both 
functions  have  the  same  pattern  of  sample  values,  even  though  their  overall  behavior  is 
strikingly  different. 

This  effect  is  commonly  referred  to  as  aliasing If  you  view  a  moving  particle  under  a 
stroboscopic  light  that  flashes  only  eight  times,  you  would  be  unable  to  determine  which 


1  In  computer  graphics,  the  term  “aliasing”  is  used  in  a  much  broader  sense  that  covers  a  variety 
of  artifacts  introduced  by  discretization  —  particularly,  the  jagged  appearance  of  lines  and  smooth 
curves  on  a  digital  monitor. 
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of  the  two  graphs  the  particle  was  following.  Aliasing  is  the  cause  of  a  well-known  artifact 
in  movies:  spoked  wheels  can  appear  to  be  rotating  backwards  when  our  brain  interprets 
the  discretization  of  the  high-frequency  forward  motion  imposed  by  the  frames  of  the  film 
as  an  equivalently  discretized  low-frequency  motion  in  reverse.  Aliasing  also  has  important 
implications  for  the  design  of  music  CD’s.  We  must  sample  an  audio  signal  at  a  sufficiently 
high  rate  that  all  audible  frequencies  can  be  adequately  represented.  In  fact,  human 
appreciation  of  music  also  relies  on  inaudible  high-frequency  tones,  and  so  a  much  higher 
sample  rate  is  actually  used  in  commercial  CD  design.  But  the  sample  rate  that  was 
selected  remains  controversial;  hi  fi  aficionados  complain  that  it  was  not  set  high  enough 
to  fully  reproduce  the  musical  quality  of  an  analog  LP  record! 

The  discrete  Fourier  representation  decomposes  a  sampled  function  f(pc)  into  a  linear 
combination  of  complex  exponentials.  Since  we  cannot  distinguish  sampled  exponentials 
of  frequency  higher  than  n,  we  only  need  consider  a  finite  linear  combination 

n  —  1 

£cfceife*  (5.100) 

k  =  0 

of  the  first  n  exponentials  (5.99).  The  symbol  ~  in  (5.100)  means  that  the  function  f(pc) 
and  the  sum  p{x)  agree  on  the  sample  points: 

f(Xj)=p( Xj),  j  —  0, ...  ,n  —  1.  (5.101) 

Therefore,  p(x)  can  be  viewed  as  a  (complex- valued)  interpolating  trigonometric  polynomial 
of  degree  <  n  —  1  for  the  sample  data  f  ■  —  f{x-). 

Remark.  If  f{x)  is  real,  then  p{x)  is  also  real  on  the  sample  points,  but  may  very  well 
be  complex-valued  in  between.  To  avoid  this  unsatisfying  state  of  affairs,  we  will  usually 
discard  its  imaginary  component,  and  regard  the  real  part  of  p(x)  as  “the”  interpolating 
trigonometric  polynomial.  On  the  other  hand,  sticking  with  a  purely  real  construction 
unnecessarily  complicates  the  underlying  mathematical  analysis,  and  so  we  will  retain  the 
complex  exponential  form  (5.100)  of  the  discrete  Fourier  sum. 

Since  we  are  working  in  the  finite-dimensional  complex  vector  space  Cn  throughout, 
we  can  reformulate  the  discrete  Fourier  series  in  vectorial  form.  Sampling  the  basic 
exponentials  (5.99)  produces  the  complex  vectors 

(jj^=^e'ikx0  gifcxi  e'ikx2  e'ikxri-1^T 

T 

-j  2kiv\/n  Akivi/n  2  (n  —  1)  kir  i  /n  \ 

1,0  ,0  ,  .  .  .  ,  O  I  , 

The  interpolation  conditions  (5.101)  can  be  recast  in  the  equivalent  vector  form 

f  =  C0  ^0  +  C1  W1  +  +Cn-lU?n_l-  (5.103) 

In  other  words,  to  compute  the  discrete  Fourier  coefficients  c0, ... ,  cn_1  of  /,  all  we  need 
to  do  is  rewrite  its  sample  vector  f  as  a  linear  combination  of  the  sampled  exponential 
vectors  u?0, . . . ,  eon_1. 

Now,  the  absolutely  crucial  property  is  the  orthonormality  of  the  basis  elements 
u;0, . . . ,  0Jn_1.  Were  it  not  for  the  power  of  orthogonality,  Fourier  analysis  might  have 
remained  a  mere  mathematical  curiosity,  rather  than  today’s  indispensable  tool. 


k  =  0,  ...,n—  1.  (5.102) 


f(x)  -  p(x)  =  c0  +  cx  e 


1  X 


+  c2  e 


2  i  x 


+ 


+  C 


,(n  — 1)  i  x  _ 


n- 
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Proposition  5.26.  The  sampled  exponential  vectors  cj0,  . . .  form  an  orthonormal 

basis  of  Cn  with  respect  to  the  inner  product 


(f  ,g) 


i 


n  —  1 


-  Y  fj  9 j 


3=0 


1 

n 


n—  1 

Y  fixj)  g(xj) , 

3=0 


f,g  e  Cn. 


(5.104) 


Remark.  The  inner  product  (5.104)  is  a  rescaled  version  of  the  standard  Hermitian  dot 
product  (3.98)  between  complex  vectors.  We  can  interpret  the  inner  product  between  the 
sample  vectors  f ,  g  as  the  average  of  the  sampled  values  of  the  product  signal  f(x)  g(x). 

Proof :  The  crux  of  the  matter  relies  on  properties  of  the  remarkable  complex  numbers 

C  =  e27ri/n  =  cos - f  i  sin —  ,  where  n  =  l,2,3, ....  (5.105) 

n  n 

Particular  cases  include 

C2  =  C3  =  -i  +  4T  C4=i>  and  C8  =  ir  +  iri-  (5-106) 


The  nth  power  of  (n  is 


c  = 


e2^i/n\  n  =  e27ri  =  ^ 


and  hence  (n  is  one  of  the  complex  nth  roots  of  unity.  (n  =  yl.  There  are,  in  fact,  n 
distinct  complex  nth  roots  of  1,  including  1  itself,  namely  the  powers  of  (  : 


>-k  9  k  7r  i  /  n  2/C7T  ..  2A)7T 

£  =  e2k7T1/n  =  cos - h  i  sin - 


n 


n 


k  —  0, . . . ,  n  —  1. 


(5.107) 


Since  it  generates  all  the  others,  (n  is  known  as  a  primitive  nth  root  of  unity .  Geometrically, 
the  nth  roots  (5.107)  are  the  vertices  of  a  regular  unit  n- gon  inscribed  in  the  unit  circle 
|  z  |  =  1;  see  Figure  5.17  for  the  case  n  —  5,  where  the  roots  form  the  vertices  of  a  regular 
pentagon.  The  primitive  root  fn  is  the  first  vertex  we  encounter  as  we  go  around  the  n- gon 
in  a  counterclockwise  direction,  starting  at  1.  Continuing  around,  the  other  roots  appear 
in  their  natural  order  Cn ,  Cn ,  •  •  • ,  Cn_14  cycling  back  to  Cff  =  1.  The  complex  conjugate  of 
c  is  the  “last”  nth  root: 


C  n 


s-n  —  1  2(n  —  l)-7r  i  /n 

C 


e 


2  7T  i  /  n 


(5.108) 
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The  complex  numbers  (5.107)  are  a  complete  set  of  roots  of  the  polynomial  zn  —  1, 
which  can  therefore  be  completely  factored: 

zn - 1  =  (z - i)(z - c n)(z -cl)  ■■■  (z- cr1)- 

On  the  other  hand,  elementary  algebra  provides  us  with  the  real  factorization 

zn  -  l  =  (z  -  1)(1  + Z  +  z2  +  ■■■  +  zn~ 1). 


Comparing  the  two,  we  conclude  that 

1  -h  Z  -h  Z2  -h  •••  +  Zn  1  =  (z  -  Cn)(Z  -  Cn)  *  “  (Z  ~  Q 
Substituting  2  =  (k  into  both  sides  of  this  identity,  we  deduce  the  useful  formula 

k  =  0, 


1  +  C  +  Ck  +  •••  +  cN1)fc  =  { 


0  <  k  <  n. 


(5.109) 


Since  Cn+fc  =  C ^  this  formula  can  easily  be  extended  to  general  integers  k\  the  sum  is  equal 
to  n  if  n  evenly  divides  k  and  is  0  otherwise. 

Now,  let  us  apply  what  we’ve  learned  to  prove  Proposition  5.26.  First,  in  view  of  (5.107), 
the  sampled  exponential  vectors  (5.102)  can  all  be  written  in  terms  of  the  nth  roots  of  unity: 


_  /  -|  /-k  /-2k  /-3k  /-(n—l)k 

^ k  V  A  877,’  5  8n  1  '  '  '  1  8n  /  ’ 


k  =  0, . . . ,  n  —  1 


(5.110) 


Therefore,  applying  (5.108, 109),  we  conclude  that 


n  —  1 


n  —  l 


ik-i) 


-  E  =r  E  «' 

j  =  0  j  =  0 

which  establishes  orthonormality  of  the  sampled  exponential  vectors. 


k  =  Z, 

k  ^  l, 


0  <  fc,  l  <  n, 


Q.E.D. 


Orthonormality  of  the  basis  vectors  implies  that  we  can  immediately  compute  the 
Fourier  coefficients  in  the  discrete  Fourier  sum  (5.100)  by  taking  inner  products: 


1  n—l  1  n—l 

,«*)  =  -  E  t eifcXj  =  -  N  4- 

l  =  o  j  =  0 


—  ikx-j 


1 


n  —  l 


=  -Ec"‘/j 


(5.111) 


j  =  o 


In  other  words,  the  discrete  Fourier  coefficient  cfc  is  obtained  by  averaging  the  sampled 
values  of  the  product  function  f(x)e~lkx.  The  passage  from  a  signal  to  its  Fourier 
coefficients  is  known  as  the  Discrete  Fourier  Transform  or  DFT  for  short.  The  reverse 
procedure  of  reconstructing  a  signal  from  its  discrete  Fourier  coefficients  via  the  sum  (5.100) 
(or  (5.103))  is  known  as  the  Inverse  Discrete  Fourier  Transform  or  IDFT. 

Example  5.27.  If  n  —  4,  then  £4  =  i .  The  corresponding  sampled  exponential  vectors 


/F 

(  A 

/  A 

(  X\ 

1 

i 

-1 

—  i 

u?0  — 

1 

,  = 

-1 

5  ^2  — 

1 

5  U3  ~ 

-1 

\1  J 

\  — i  / 

\-l/ 

V  i  / 

form  an  orthonormal  basis  of  C4  with  respect  to  the  averaged  Hermit ian  dot  product 

fwo\ 

uu 


V  ,  w 


=  j  ( v0  w0  +  v1  w1  +  v2  w2  +  v3  w3  ) ,  where 


v  = 


/M 

vi 
v2 

\v3/ 


w  = 


Wc 


\w3J 
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The  Discrete  Fourier  Representation  of  2tyx  —  x2 . 


Given  the  sampled  function  values 


/o  =  /(0),  fl  =  f(h T  A  =  /(A)> 
we  construct  the  discrete  Fourier  representation 

f  =  C0  w0  +  C1  W1  +  C2  ^2  +  C3  ^3> 


A  =  /(  l7r)- 

(5.112) 


where 

c0  =  ( f , u;0 }  =  i(/o  +  A  +  /2  +  A)>  ci  =  ( f  ,  wi  >  =  i(/o  -  i  A 

c2  =  ( f  >  ^2  )  =  j(/0  -  A  +  A  -  At  cs  =  ( f .  )  =  i(/o  +  1 A 

We  interpret  this  decomposition  as  the  complex  exponential  interpolant 

f(x)  ~  p(x)  =  c0  +  cx  elx  +  c2  e2lx  +  c3  e3lx 


A  +  i/3)> 
A  -  i/3)- 


that  agrees  with  /(x)  on  the  4  sample  points. 

For  instance,  if 

f(x)  —  2tyx  —  x2, 

then 

fo  =  0,  A  =  7.4022,  A  =  9.8696,  f3  =  7.4022, 

and  hence 

c0  =  6.1685,  cx  =  -2.4674,  c2  =  -1.2337,  c3  =  -2.4674. 
Therefore,  the  interpolating  trigonometric  polynomial  is  given  by  the  real  part  of 


p4(x)  =  6.1685  -  2.4674eix  -  1.2337 e2ix  -  2.4674e3ia:,  (5.113) 

namely, 

Re  p4(x)  —  6.1685  —  2.4674  cos x  —  1.2337  cos 2x  —  2.4674  cos 3x.  (5.114) 


In  Figure  5.18,  we  compare  the  function,  with  the  interpolation  points  indicated,  and  its 
discrete  Fourier  representations  (5.114)  for  both  n  —  4  in  the  first  row,  and  n  —  16  points 
in  the  second.  The  resulting  graphs  point  out  a  significant  difficulty  with  the  Discrete 
Fourier  Transform  as  developed  so  far.  While  the  trigonometric  polynomials  do  indeed 
correctly  match  the  sampled  function  values,  their  pronounced  oscillatory  behavior  makes 
them  completely  unsuitable  for  interpolation  away  from  the  sample  points. 
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Figure  5.19.  The  Low-Frequency  Discrete  Fourier  Representation  of  x2  —  2  tv x. 


However,  this  difficulty  can  be  rectified  by  being  a  little  more  clever.  The  problem  is 
that  we  have  not  been  paying  sufficient  attention  to  the  frequencies  that  are  represented 
in  the  Fourier  sum.  Indeed,  the  graphs  in  Figure  5.18  might  remind  you  of  our  earlier 
observation  that,  due  to  aliasing,  low-  and  high-frequency  exponentials  can  have  the  same 
sample  data,  but  differ  wildly  in  between  the  sample  points.  While  the  first  half  of  the 
summands  in  (5.100)  represent  relatively  low  frequencies,  the  second  half  do  not,  and  can  be 
replaced  by  equivalent  lower- frequency,  and  hence  less-oscillatory,  exponentials.  Namely, 
if  0  <  k  <  |  n,  then  e~lkx  and  el(n~k)x  have  the  same  sample  values,  but  the  former  is  of 
lower  frequency  than  the  latter.  Thus,  for  interpolatory  purposes,  we  should  replace  the 
second  half  of  the  summands  in  the  Fourier  sum  (5.100)  by  their  low-frequency  alternatives. 
If  n  —  2  m  +  1  is  odd,  then  we  take 


T2m+i(x)  —  c-me  irnx+  •••  +  c_1e  lx  +  c0  +  ci  elx  +  •••  +cm 


i  rn  x 


E  c^ik 

k  =  —  m 


(5.115) 

as  the  equivalent  low-frequency  interpolant.  If  n  =  2  m  is  even  —  which  is  the  most 
common  case  occurring  in  applications  —  then 


m  —  1 


LW  =  C~me  lmX+  •••  +C 


1  X 


TCqT Ci  elXjr  •  •  •  Tc 


m  —  l 


i  (m—l)  x 


E  v'" 

k  =  —  m 


(5.116) 

will  be  our  choice.  (It  is  a  matter  of  personal  taste  whether  to  use  e~irnx  or  eirnx  to 
represent  the  highest-frequency  term.)  In  both  cases,  the  Fourier  coefficients  with  negative 
indices  are  the  same  as  their  high-frequency  alternatives: 


C_k  =  cn_k  =  (f,u>n_k)  =  {f  ,u>_k  >,  (5.117) 

where  u>_k  =  u>n_k  is  the  sample  vector  for  e~lkx  ~  e1  (n~k'>x. 

Returning  to  the  previous  example,  for  interpolating  purposes,  we  should  replace  (5.113) 
by  the  equivalent  low-frequency  interpolant 

p4(x)  =  -  1.2337 e~2ix  -  2.4674 e~ix  +  6.1685  -  2.4674 eix,  (5.118) 

with  real  part 

Re  p4(x)  —  6.1685  —  4.9348  cosx  —  1.2337  cos 2 ax 
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Graphs  of  the  n  —  4  and  16  low-frequency  trigonometric  interpolants  can  be  seen  in 
Figure  5.19.  Thus,  by  utilizing  only  the  lowest-frequency  exponentials,  we  successfully 
suppress  the  aliasing  artifacts,  resulting  in  a  quite  reasonable  trigonometric  interpolant  to 
the  given  function  on  the  entire  interval. 

Remark.  The  low-frequency  version  also  serves  to  unravel  the  reality  of  the  Fourier 
representation  of  a  real  function  f(x).  Since  formula  (5.117)  implies  that 

c_  k  =  c^T,  and  so  the  common  frequency  terms 

c_ke~lkx  +  ckelkx  —  ak  cos k x  +  bk  sin k x 

add  up  to  a  real  trigonometric  function.  Therefore,  the  odd  n  interpolant  (5.115)  is  a  real 
trigonometric  polynomial,  whereas  in  the  even  version  (5.116)  only  the  highest-frequency 
term  c_m  e~irnx  produces  a  complex  term  —  which  is,  in  fact,  0  on  the  sample  points. 


Exercises 


5.6.1.  Find  (i)  the  discrete  Fourier  coefficients,  and  (ii)  the  low-frequency  trigonometric  inter¬ 
polant,  for  the  following  functions  using  the  indicated  number  of  sample  points:  (a)  sinx, 

1,  x  <  2, 


n  =  4,  (b) 


x 


7 T 


n  =  6,  (c)  f(x)  = 


0,  x  >  2, 


n  =  6,  (d)  sign(x  — 7r),  n  =  8. 


5.6.2.  Find  (i)  the  sample  values,  and  (ii)  the  trigonometric  interpolant  corresponding  to  the 
following  discrete  Fourier  coefficients:  (a)  c_1  =  c1  =  l,c0  =  0, 

(b)  c_ 2  =  C0  =  c2  =  1,  c_1  =  Cl  =  -1,  (c)  c_2  =c0  =  c1  =2,  c_1  =c2=  0, 

(d)  Cq  =  c2  =  c4  =  1,  cj  =  c3  =  c5  =  —1. 

4b  5.6.3.  Let  f(x)  =  x.  Compute  its  discrete  Fourier  coefficients  based  on  n  =  4,8  and 

16  interpolation  points.  Then,  plot  f(x)  along  with  the  resulting  (real)  trigonometric 
interpolants  and  discuss  their  accuracy. 


4b  5.6.4.  Answer  Exercise  5.6.3  for  the  functions  (a)  x2, 

1,  \  7T  <  X  <  |  7T, 


(d)  cos  \  x,  (e)  ( 


otherwise. 


(b)  ( 
(f) 


X  —  7f)  2  , 

(  x, 

[  2tt  —  x, 


(c)  sinx, 

0  <  X  <  7T, 

7T  <  X  <  27T. 


ry 

5.6.5.  (a)  Draw  a  picture  of  the  complex  plane  with  the  complex  solutions  to  z°  =  1  marked, 
(b)  What  is  the  exact  formula  (no  trigonometric  functions  allowed)  for  the  primitive  sixth 

root  of  unity  £6?  (c)  Verify  explicitly  that  1  +  C6  +  C|  +  C|  +  Ce  +  C|  —  0-  (h)  Give  a 

geometrical  explanation  of  this  identity. 

0  5.6.6.  (a)  Explain  in  detail  why  the  nth  roots  of  1  lie  on  the  vertices  of  a  regular  n- gon.  What 
is  the  angle  between  two  consecutive  sides? 

(b)  Explain  why  this  is  also  true  for  the  nth  roots  of  every  non-zero  complex  number  z  ^  0. 
Sketch  a  picture  of  the  hexagon  corresponding  to  ^fz  for  a  given  z  7^  0. 

0  5.6.7.  In  general,  an  nth  root  of  unity  £  is  called  primitive  if  all  the  nth  roots  of  unity 

are  obtained  by  raising  it  to  successive  powers:  1,  £,  £2,  £3, . . .  .  (a)  Find  all  primitive 
(i)  fourth,  (ii)  fifth,  (Hi)  ninth  roots  of  unity,  (b)  Can  you  characterize  all  the  primitive 
nth  roots  of  unity? 


5.6.8.  (a)  In  Example  5.27,  the  n  =  4  discrete  Fourier  coefficients  of  the  function 
f(x)  =  27 rx  —  x2  were  found  to  be  real.  Is  this  true  when  n  =  16?  For  general  n? 

(b)  What  property  of  a  function  f(x)  will  guarantee  that  its  Fourier  coefficients  are  real? 
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The  Original  Signal 


The  Noisy  Signal 


The  Denoised  Signal 


Comparison  of  the  Two 


Figure  5.20.  Denoising  a  Signal. 


m 

C  5.6.9.  Let  c  =  (c0,c1} . . .  ,cn_1)  £  Cn  be  the  vector  of  discrete  Fourier  coefficients 

corresponding  to  the  sample  vector  f  =  (/0,  /1? .  . . ,  fn_1)  .  (a)  Explain  why  the  sampled 
signal  f  =  Fn  c  can  be  reconstructed  by  multiplying  its  Fourier  coefficient  vector  by  an  n  x  n 
matrix  Fn.  Write  down  F2,  F3,  F4,  and  Fg.  What  is  the  general  formula  for  the  entries  of 


Fn?  (b)  Prove  that,  in  general,  Fn  1 


transpose  defined  in  Exercise  4.3.25. 
C_1  = 

w  n  ^  n ' 


F1'  =  —  F  T 

n  n  n  ’ 


Prove  that  U, 


n 


where  ^  denotes  the  Hermitian 

=  — F  is  a  unitary  matrix,  i.e., 
yn 


Compression  and  Denoising 


In  a  typical  experimental  signal,  noise  primarily  affects  the  high-frequency  modes,  while 
the  authentic  features  tend  to  appear  in  the  low  frequencies.  Think  of  the  hiss  and  static 
you  hear  on  an  AM  radio  station  or  a  low-quality  audio  recording.  Thus,  a  very  simple,  but 
effective,  method  for  denoising  a  corrupted  signal  is  to  decompose  it  into  its  Fourier  modes, 
as  in  (5.100),  and  then  discard  the  high-frequency  constituents.  A  similar  idea  underlies  the 
Dolby  recording  system  used  on  most  movie  soundtracks:  during  the  recording  process,  the 
high-frequency  modes  are  artificially  boosted,  so  that  scaling  them  back  when  the  movie 
is  shown  in  the  theater  has  the  effect  of  eliminating  much  of  the  extraneous  noise.  The 
one  design  issue  is  the  specification  of  a  cut-off  between  low  and  high  frequency,  that  is, 
between  signal  and  noise.  This  choice  will  depend  upon  the  properties  of  the  measured 
signal,  and  is  left  to  the  discretion  of  the  signal  processor. 

A  correct  implementation  of  the  denoising  procedure  is  facilitated  by  using  the  unaliased 
forms  (5.115, 116)  of  the  trigonometric  interpolant,  in  which  the  low-frequency  summands 
appear  only  when  |  k  |  is  small.  In  this  version,  to  eliminate  high-frequency  components, 
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Figure  5.21.  Compressing  a  Signal, 
we  replace  the  full  summation  by 

i 

Qi(x)  =  ^2  ckelkx -  (5.119) 

k=-l 

where  l  <  \{n+  1)  specifies  the  selected  cut-off  frequency  between  signal  and  noise.  The 
21  +  1  «  n  low-frequency  Fourier  modes  retained  in  (5.119)  will,  in  favorable  situations, 
capture  the  essential  features  of  the  original  signal  while  simultaneously  eliminating  the 
high-frequency  noise. 

In  Figure  5.20  we  display  a  sample  signal  followed  by  the  same  signal  corrupted  by  adding 
in  random  noise.  We  use  n  =  29  =  512  sample  points  in  the  discrete  Fourier  representation, 
and  to  remove  the  noise,  we  retain  only  the  21  +  1  =  11  lowest-frequency  modes.  In 
other  words,  instead  of  all  n  —  512  Fourier  coefficients  c_256, . . . ,  c_1?  c0,  c1? . . . ,  c255,  we 
compute  only  the  11  lowest-order  ones  c_5, . . . ,  c5.  Orthogonality  is  the  key  that  allows  us 
to  do  this!  Summing  up  just  those  11  exponentials  produces  the  denoised  signal  q(x)  = 
c_5  e~5lx  +  •  •  •  +  c5  e5lx.  To  compare,  we  plot  both  the  original  signal  and  the  denoised 
version  on  the  same  graph.  In  this  case,  the  maximal  deviation  is  less  than  .15  over  the 
entire  interval  [0,27 r]. 

The  same  idea  underlies  many  data  compression  algorithms  for  audio  recordings,  digital 
images,  and,  particularly,  video.  The  goal  is  efficient  storage  and/or  transmission  of  the 
signal.  As  before,  we  expect  all  the  important  features  to  be  contained  in  the  low-frequency 
constituents,  and  so  discarding  the  high-frequency  terms  will,  in  favorable  situations, 
not  lead  to  any  noticeable  degradation  of  the  signal  or  image.  Thus,  to  compress  a 
signal  (and,  simultaneously,  remove  high-frequency  noise),  we  retain  only  its  low-frequency 
discrete  Fourier  coefficients.  The  signal  is  reconstructed  by  summing  the  associated  discrete 
Fourier  representation  (5.119).  A  mathematical  justfficat ion  of  Fourier-based  compression 
algorithms  relies  on  the  fact  that  the  Fourier  coefficients  of  smooth  functions  tend  rapidly 
to  zero  —  the  smoother  the  function,  the  faster  the  decay  rate;  see  [61]  for  details.  Thus, 
the  small  high-frequency  Fourier  coefficients  will  be  of  negligible  importance. 

In  Figure  5.21,  the  same  signal  is  compressed  by  retaining,  respectively,  21  +  1  =  21 
and  21  +  1  =  7  Fourier  coefficients  only  instead  of  all  n  =  512  that  would  be  required  for 
complete  accuracy.  For  the  case  of  moderate  compression,  the  maximal  deviation  between 
the  signal  and  the  compressed  version  is  less  than  1.5  x  10-4  over  the  entire  interval,  while 
even  the  highly  compressed  version  deviates  by  at  most  .05  from  the  original  signal.  Of 
course,  the  lack  of  any  fine-scale  features  in  this  particular  signal  means  that  a  very  high 
compression  can  be  achieved  —  the  more  complicated  or  detailed  the  original  signal,  the 
more  Fourier  modes  need  to  be  retained  for  accurate  reproduction. 
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Exercises 


4b  5.6.10.  Construct  the  discrete  Fourier  coefficients  for  /(x) 


—  x, 

<  X  —  |  7T, 
k  —  X  +  27T, 


0  <  x  <  ^7 r, 

3  7T  <  X  <  3  7T, 
|  7T  <  X  <  27T. 


based  on  n  =  128  sample  points.  Then  graph  the  reconstructed  function  when  using 
the  data  compression  algorithm  that  retains  only  the  11  and  21  lowest-frequency  modes. 
Discuss  what  you  observe. 


* 

* 


5.6.11.  Answer  Exercise  5.6.10  when  /(x) 


(a)  x;  (b)  x2(27r  —  x)2;  (c) 


sinx,  0  <  x  <  7r, 

0,  7T  <  X  <  27T. 


5.6.12.  Let  qi(x)  denote  the  trigonometric  polynomial  (5.119)  obtained  by  summing  the  first 
21  +  1  discrete  Fourier  modes.  Suppose  the  criterion  for  compression  of  a  signal  /(x)  is 
that  ||  /  —  Qi  |  loo  =  max{  |  /(x)  —  <^(x)  |  |0<x<27t}<£.  For  the  particular  function  in 
Exercise  5.6.10,  how  large  do  you  need  to  choose  k  when  e  =  .1?  £  =  .01?  s  =  .001? 


4»  5.6.13.  Let  /(x)  =  x(2tt  —  x)  be  sampled  on  n  =  128  equally  spaced  points  between  0  and 
2  7r.  Use  a  random  number  generator  with  —  1  <  r  •  <  1  to  add  noise  by  replacing  each 

sample  value  f  ■  =  f(xj)  by  g-  =  /•  +  er  •.  Investigate,  for  different  values  of  £,  how  many 

discrete  Fourier  modes  are  required  to  reconstruct  a  reasonable  denoised  approximation  to 
the  original  signal. 


4b  5.6.14.  The  signal  in  Figure  5.20  was  obtained  from  the  explicit  formula 

/(x)  =  —  i  ^  X ^  ^  (x  +  1.5)(x  +  2.5)(x  -  4)  +  1.7. 

Noise  was  added  by  using  a  random  number  generator.  Experiment  with  different 
intensities  of  noise  and  different  numbers  of  sample  points  and  discuss  what  you  observe. 

X  5.6.15.  If  we  use  the  original  form  (5.100)  of  the  discrete  Fourier  representation,  we  might  be 
tempted  to  denoise/compress  the  signal  by  retaining  only  the  first  0  <  k  <  l  terms  in  the 
sum.  Test  this  method  on  the  signal  in  Exercise  5.6.10  and  discuss  what  you  observe. 

5.6.16.  True  or  false:  If  /(x)  is  real,  the  compressed/denoised  signal  (5.119)  is  a  real 
trigonometric  polynomial. 


The  Fast  Fourier  Transform 

While  one  may  admire  an  algorithm  for  its  intrinsic  beauty,  in  the  real  world,  the  bottom 
line  is  always  efficiency  of  implementation:  the  less  total  computation,  the  faster  the 
processing,  and  hence  the  more  extensive  the  range  of  applications.  Orthogonality  is  the 
first  and  most  important  feature  of  many  practical  linear  algebra  algorithms,  and  is  the 
critical  feature  of  Fourier  analysis.  Still,  even  the  power  of  orthogonality  reaches  its  limits 
when  it  comes  to  dealing  with  truly  large-scale  problems  such  as  three-dimensional  medical 
imaging  or  video  processing.  In  the  early  1960’s,  James  Cooley  and  John  Tukey,  [15], 
discovered^  a  much  more  efficient  approach  to  the  Discrete  Fourier  Transform,  exploiting 
the  rather  special  structure  of  the  sampled  exponential  vectors.  The  resulting  algorithm  is 


t  In  fact,  the  key  ideas  can  be  found  in  Gauss’s  hand  computations  in  the  early  1800’s,  but  his 
insight  was  not  fully  appreciated  until  modern  computers  arrived  on  the  scene. 
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known  as  the  Fast  Fourier  Transform ,  often  abbreviated  FFT,  and  its  discovery  launched 
the  modern  revolution  in  digital  signal  and  data  processing,  [9,  10]. 

In  general,  computing  all  the  discrete  Fourier  coefficients  (5.111)  of  an  n  times  sampled 
signal  requires  a  total  of  n2  complex  multiplications  and  n2  —  n  complex  additions.  Note 
also  that  each  complex  addition 

z  +  w  —  (x  +  i  y)  +  {u  +  i  v)  —  (x  +  u)  +  i  (y  +  v)  (5.120) 

generally  requires  two  real  additions,  while  each  complex  multiplication 

zw  =  (x  +  i  y)  (u  +  i  v)  =  (xu  —  y  v)  +  i  (x  v  +  yu)  (5.121) 


requires  4  real  multiplications  and  2  real  additions,  or,  by  employing  the  alternative  formula 


xv  +  yu  —  (x  +  y )  (u  +  v)  —  xu  —  yv 


(5.122) 


for  the  imaginary  part,  3  real  multiplications  and  5  real  additions.  (The  choice  of  formula 
(5.121)  or  (5.122)  will  depend  upon  the  processor’s  relative  speeds  of  multiplication  and 
addition.)  Similarly,  given  the  Fourier  coefficients  c0, . . . ,  cn_1,  reconstruction  of  the  sampled 
signal  via  (5.100)  requires  n2  —  n  complex  multiplications  and  n2  —  n  complex  additions. 
As  a  result,  both  computations  become  quite  labor-intensive  for  large  n.  Extending  these 
ideas  to  multi-dimensional  data  only  exacerbates  the  problem.  The  Fast  Fourier  Transform 
provides  a  shortcut  around  this  computational  bottleneck  and  thereby  signnfficantly  extends 
the  range  of  discrete  Fourier  analysis. 

In  order  to  explain  the  method  without  undue  complication,  we  return  to  the  original, 
aliased  form  of  the  discrete  Fourier  representation  (5.100).  (Once  one  understands  how 
the  FFT  works,  one  can  easily  adapt  the  algorithm  to  the  low-frequency  version  (5.116).) 
The  seminal  observation  is  that  if  the  number  of  sample  points  n  =  2  m  is  even,  then  the 
primitive  mth  root  of  unity  =  VT  equals  the  square  of  the  primitive  nth  root:  —  ( 2. 

We  use  this  fact  to  split  the  summation  (5.111)  for  the  order  n  discrete  Fourier  coefficients 
into  two  parts,  collecting  the  even  and  the  odd  powers  of  Cff’. 


Ck  -  -  (/o  +  flCnk  +  /2C  2k  +  •••  +  fn- lCn  1)fe) 

/  L 

—  —  (/o  +  /2C1  2k  +  /4C1  4fe  +  ■"  +/2m-2Cn(2m  ^  k  )  + 

/  L 

I  r-k  —  (  f  _Lfy-2fc_Lfy~4fc_L  ...  If  f-(2m-2)k  \ 

'  Sn  Wl  t-/3Sn  '  ./5  Sn  '  '  J  2rn-l  ) 

/  L 

=  2  I  ~  (/o  +  h  Cmk  +  f 4  Cm  2  k  +  •  •  •  +  f 2m  — 2  Cm  ^  ^  )  |  + 

+  {  ~  (/ i  +  fsCj  +  /sCm2^  +  "  '  +  /2m-lCm(m  )  j  • 


Now,  observe  that  the  expressions  in  braces  are  the  order  m  Fourier  coefficients  for  the 
sample  data 


f6  —  (  /o?  /2j  /4j  •  •  •  ?  f 2m— 2  ) 

f°  =  (/i,/3,/5.---»/2m-l) 


( f{x 0),  f(x2),  f(x4), . . . ,  f{x2m_2) )  , 
(  f(x  1),  f(x3),  f{x5),  •  •  •  ,  f(x 2m-i)  )T  ■ 


(5.124) 


Note  that  fe  is  obtained  by  sampling  f(x)  on  the  even  sample  points  x2  -,  while  f°  is 
obtained  by  sampling  the  same  function  f(x):  but  now  at  the  odd  sample  points  x2j+i-  In 
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other  words,  we  are  splitting  the  original  sampled  signal  into  two  “half-sampled”  signals 
obtained  by  sampling  on  every  other  point.  The  even  and  odd  Fourier  coefficients  are 

l  r  ...  i 

2m  — 2  Sm  )  •> 

k  —  0, . . . ,  m  —  1. 

2k  |  |  r  /-  —  (m  —  l)k\ 

m  ,  -  -  ...  „  >m  I  I  J  2m  —  Om  /’  (5.125) 

Since  they  contain  just  m  data  values,  both  the  even  and  odd  samples  require  only  m 
distinct  Fourier  coefficients,  and  we  adopt  the  identification 


°t  —  ™  bo + /2CA + fiCm2k  +  3 

lib 

Ck  ~  —  (/l  +  fi^mk  +  fb^m2k  +  ^  )  > 


re  =  re 
^k+m  Wc’ 


r°  =  r° 
L'fc+ra  Wc’ 


k  —  0, . . . ,  m  —  1. 


(5.126) 


Therefore,  the  order  n  —  2  m  discrete  Fourier  coefficients  (5.123)  can  be  constructed  from 
a  pair  of  order  m  discrete  Fourier  coefficients  via 


Cfe  =  sFfc  +  C/cg), 


k  —  0, . . . ,  n  —  1. 


(5.127) 


Now  if  m  —  21  is  also  even,  then  we  can  play  the  same  game  on  the  order  m  Fourier 
coefficients  (5.125),  reconstructing  each  of  them  from  a  pair  of  order  l  discrete  Fourier 
coefficients  —  obtained  by  sampling  the  signal  at  every  fourth  point.  If  n  —  2r  is  a  power 
of  2,  then  this  game  can  be  played  all  the  way  back  to  the  start,  beginning  with  the  trivial 
order  1  discrete  Fourier  representation,  which  just  samples  the  function  at  a  single  point. 
The  result  is  the  desired  algorithm.  After  some  rearrangement  of  the  basic  steps,  we  arrive 
at  the  Fast  Fourier  Transform,  which  we  now  present  in  its  final  form. 

We  begin  with  a  sampled  signal  on  n  —  2r  sample  points.  To  efficiently  program  the 
Fast  Fourier  Transform,  it  helps  to  write  out  each  index  0  <  j  <  2r  in  its  binary  (as 
opposed  to  decimal)  representation 


j  =  3r- 1  Jr-2  ■  ■  ■  kh  io>  where  iu  —  0  or  1;  (5.128) 

the  notation  is  shorthand  for  its  r  digit  binary  expansion 

j  =  Jo  +  2ji  +  4j2  +  8j3  +  •••  +  2r  1  jr_  i- 
We  then  define  the  bit  reversal  map 

P(jr  — 1  jr—2  •••  2  2  Jl  7o)  ~  Jo  j  1^2  •••  Jr— 2  Jr—1  ’  (5.129) 


For  instance,  if  r  =  5,  and  j  =  13,  with  5  digit  binary  representation  01101,  then  p(j)  =  22 
has  the  reversed  binary  representation  10110.  Note  especially  that  the  bit  reversal  map 
p  —  pr  depends  upon  the  original  choice  of  r  =  log2  n. 

Secondly,  for  each  0  <  k  <  r,  define  the  maps 


ak(j)  ~  Jr- 1 
PkU)  =  3r-l 


3k+ 1  ®  k- 1  •  •  •  io> 

jk+ilk-i  •••  io  =  «fe(j) +  2fe, 


for  j  =  3r- 1  Jr-2  ■■■  3i  Jo-  (5.130) 


In  other  words,  exk(j)  sets  the  kth  binary  digit  of  j  to  0,  while  f3k(j)  sets  it  to  1.  In  the 
preceding  example,  a2(13)  =  9,  with  binary  form  01001,  while  /32(  13)  =  13  with  binary 
form  01101.  The  bit  operations  (5.129, 130)  are  especially  easy  to  implement  on  modern 
binary  computers. 

Given  a  sampled  signal  /0, . . . ,  /n_1,  its  discrete  Fourier  coefficients  c0, . . . ,  cn_1  are 
computed  by  the  following  iterative  algorithm: 


Xk+ 1)  _ 


J 


=  u 


.(fc) 


+  C 


—j 


.(fc) 


ak(j)  s2fc  +  x  Pk(j) 


). 


j  =  0,  1, 

k  —  0, . . . ,  r  —  1, 


(5.131) 
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in  which  (2 k+i  is  the  primitive  2fc+1  root  of  unity.  The  hnal  output  of  the  iterative 
procedure,  namely 

Cj  =  cf\  j  =  0, . . . ,  n  -  1,  (5.132) 

are  the  discrete  Fourier  coefficients  of  our  signal.  The  preprocessing  step  of  the  algorithm, 
where  we  define  cj°\  produces  a  more  convenient  rearrangement  of  the  sample  values.  The 
subsequent  steps  successively  combine  the  Fourier  coefficients  of  the  appropriate  even  and 
odd  sampled  subsignals,  reproducing  (5.123)  in  a  different  notation.  The  following  example 
should  help  make  the  overall  process  clearer. 

Example  5.28.  Consider  the  case  r  =  3,  and  so  our  signal  has  n  =  23  =  8  sampled 
values  /0, /!,..., /7.  We  begin  the  process  by  rearranging  the  sample  values 


c(0)  -  f 
—  J O’ 


.(0)  _ 


(0)  _ 


=  U  cT=/2, 


C(0)  -  f 


.(o)  _ 


'0  —  J  0  ?  m  _  J  45  ^2  —  J  25  °3  —  J  6’  °4  —  /l?  —  45,  °6  —  J  3’  _  J  7' 

in  the  order  specified  by  the  bit  reversal  map  p.  For  instance,  p(3)  =  6,  or,  in  binary 
notation,  p(011)  =  110. 

The  first  stage  of  the  iteration  is  based  on  =  —  1.  Equation  (5.131)  gives 


c(0)  =  f 

J  5  5 


c(0)  -  f 
—  J  3  5 


40)  =  /7, 


.(1)  _  1/J0)  ,  (0) 


o  Mer  +  c 


,(i)  _  i 


=  2  (C 


(0) 

4 


+40) 


i  ), 
), 


A1)  _  I (r(°)  _ 

C1  ~  2  Vc0  C1  b 


IF  _  1/J0) 


(0) 

1 

(0) 


if 

'2 

,(1)  _  1 


C2  2 (C2 


"6 


=  2  (C 


(0) 

2 

(0) 

6 


+  40) 
+  40) 


), 

), 


J1)  _  I /J°)  _ 

c3  —  2  Vc2  c3  b 


.(1)  _  1/J0) 


(0) 

3 

(0) 


"7 


=  Vcy-cn, 


where  we  combine  successive  pairs  of  the  rearranged  sample  values.  The  second  stage  of 
the  iteration  has  k  —  1  with  =  i .  We  obtain 


c 


(2) 

0 

(2)  _  1 


C0  2 (C0 


(1) 

0 

(1) 


+41} 


),  42)  =  i(41)-iCn,  cy  =  i(cy-cy),  cy  =  ycy  +  i#o, 

-  ifV1'  4-^1  J2)  _  1/J1)  _  „(2)  _  w  (1)  _  (1)n  (2)  _  1  ,  (1)  •  (lb 

4  —  2VC4  “T  l6  b  l5  —  2vL5  1  c7  /)  c6  —  2  VC4  c6  b  c7  —  2  VC5  “r  1  c7  b 

Note  that  the  indices  of  the  combined  pairs  of  coefficients  differ  by  2.  In  the  last  step, 


.(2)  _  1/J1) 


where  k  —  ‘ 

2  and  (8  = 

=  #(l+i 

),  we  combine  coefficients 

the  hnal  output 

So 

o 

=  1  (42)  +  42) )  > 

c4 

-c(3) 
—  c4 

(3) 

C1  =  C1 

=  T42) 

+  #(1- 

i)42) )> 

C5 

-c(3) 

—  c5 

(3) 

c2  —  c2 

=  1  ( 42) 

-  1 42)  )> 

C6 

-c(3) 
—  c6 

(3) 

c3  =  c3 

=  H42) 

-#(1  + 

i)42)). 

C7 

-c(3) 

—  c7 

-  Ifc(2)  _c(2)) 

—  2  ic0  c4  b 

.(2)  V2 


=  5(4”  +  i42>). 


=  p42,  +  f  (i+i)421). 


(2) 


is  the  complete  set  of  discrete  Fourier  coefficients. 


Let  us  count  the  number  of  arithmetic  operations  required  in  the  Fast  Fourier  Transform 
algorithm.  At  each  stage  in  the  computation,  we  must  perform  n  =  2r  complex  additions/ 
subtractions  and  the  same  number  of  complex  multiplications.  (Actually,  the  number  of 
multiplications  is  slightly  smaller,  since  multiplications  by  =b  1  and  d=  i  are  extremely  simple. 
However,  this  does  not  significantly  alter  the  hnal  operations  count.)  There  are  r  =  log2  n 
stages,  and  so  we  require  a  total  of  rn  —  n  log2  n  complex  additions/subtractions  and  the 
same  number  of  multiplications.  Now,  when  n  is  large,  n  log2  n  is  significantly  smaller  than 
n2,  which  is  the  number  of  operations  required  for  the  direct  algorithm.  For  instance,  if 
n  —  210  =  1,024,  then  n2  —  1,048,576,  while  n  log2  n  —  10,240  —  a  net  savings  of  99%.  As  a 
result,  many  large-scale  computations  that  would  be  intractable  using  the  direct  approach 
are  immediately  brought  into  the  realm  of  feasibility.  This  is  the  reason  why  all  modern 
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implementations  of  the  Discrete  Fourier  Transform  are  based  on  the  FFT  algorithm  and 
its  variants. 

The  reconstruction  of  the  signal  from  the  discrete  Fourier  coefficients  c0,...,cn_1  is 
speeded  up  in  exactly  the  same  manner.  The  only  differences  are  that  we  replace  C”1  =  Cn 
by  £n,  and  drop  the  factors  of  since  there  is  no  need  to  divide  by  n  in  the  final  result 
(5.100).  Therefore,  we  apply  the  slightly  modified  iterative  procedure 


and  finish  with 


f(T+ 1)  f(C  I  a7 

J  j  J  OLk(j)  '  ^2k  +  1 


f(xj)  =  fj  =  fjr\ 


Ak) 

J  pm)' 


.7=0,.. 


j  =  0,  1, 

(5.133) 

k  =  0, . . . ,  r  —  1, 

. ,  n  —  1. 

(5.134) 

Example  5.29.  The  reconstruction  formulas  in  the  case  of  n  —  8  =  23  Fourier  coefficients 

c0, ...  ,c7,  which  were  computed  in  Example  5.28,  can  be  implemented  as  follows.  First, 
we  rearrange  the  Fourier  coefficients  in  bit  reversed  order: 


A °)  _  r  f(0)  -  r  f(0)  -  r  f(0)  -  r  f  (°)  _  r 
J o  —  J 1  —  c4->  J  2  —  j  3  —  J 4  ~  L1  ? 

Then  we  begin  combining  them  in  successive  pairs: 


f(i) _  f(o)  I  AO)  AU _  f(o)  _  f(o) 

JO  —  lo  Jl  >  JI  —  J o  Jl  > 

AU  _  f(0)  I  AO)  AU  _  AO)  _  AO) 

J  4  —  j  4  '  j  5  5  J  5  —  J  4  j  5  5 

Next, 


AU  _  f(0)  I  f(0) 

J  2  —  j  2  TJ  3  ’ 

AU  _  f(0)  I  f(0) 

J6  ~  J6  '  Jl  ? 


AU  _  f(0)  _  f(0) 

13  —  12  13  ? 

AU  _  f(0)  _  f (0) 

J  7  ~  J6  J  7 


AU  _  AU  1  AU 

lo  ~  J()  “r  J2 

AU  _  AD  1  AU 

—  J 4  ~T~  Jq 


/i(2)  =  A(1)  +  i  /F 
/f  =  /F  +  i/F 


Finally,  the  sampled  signal  values  are 


/(*o)  =  /F  =  /F  +  /F, 

/(x1)  =  /1(3)  =/1(2)  +  ^1(l+  i)/s(2), 

/(*2)  =  /F  =  /F  +  uF, 

/(%)  =  /F  =  /F-^(  1-  i)/F» 


AU  _  AU  _  AU 

12  —  JO  J2  ’ 

AU  _  AU  _  AU 

Jq  ~  J 4  J6  > 


/(*4)  =  /F  -  /F  -  /F, 

/(%)  =  /F  =  /F-#(i  +  i)/F- 

/(%)  =  /F  =  /F  -  OF, 

/(*r)  =  /F  =  /F  +  Fa-  o/F- 


Exercises 


4b  5.6.17.  Use  the  Fast  Fourier  Transform  to  find  the  discrete  Fourier  coefficients  for  the  the 

following  functions  using  the  indicated  number  of  sample  points.  Carefully  indicate  each 
step  in  your  analysis. 


x 


(a)  —  ,  n  =  4;  (b)  sinx,  n  =  8;  (c)  |x  —  7r|,  n  =  8;  (d)  sign(x  —  7r),  n  =  16. 


7T 


5.6.18.  Use  the  Inverse  Fast  Fourier  Transform  to  reassemble  the  sampled  function  data 

corresponding  to  the  following  discrete  Fourier  coefficients.  Carefully  indicate  each  step  in 
your  analysis. 

(a)  c0  =  c2  =  1,  cx  =  c3  =  - 


!,  (b)  c0  —  Cjl  —  c4  —  2,  c2  —  c6  —  0,  c3  —  c5  —  c7  — 


1. 
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C  5.6.19.  In  this  exercise,  we  show  how  the  Fast  Fourier  Transform  is  equivalent  to  a  certain 
matrix  factorization.  Let  c  =  ( c0,  c1? . . . ,  c7  )T  be  the  vector  of  Fourier  coefficients, 

and  let  • . . ,  for  k  =  0,1,  2,  3,  be  the  vectors  containing 

the  coefficients  defined  in  the  reconstruction  algorithm  Example  5.29.  (a)  Show  that 

f(0)  =  M0c,  f(1)  =  M^,  f(2)  =  M2 f(1),  f  =  f(3)  =  M3f(2),  where  M0,M1,M2,M3 

are  8x8  matrices.  Write  down  their  explicit  forms,  (b)  Explain  why  the  matrix  product 
Fg  =  reproduces  the  Fourier  matrix  derived  in  Exercise  5.6.9.  Check  the 

factorization  directly,  (c)  Write  down  the  corresponding  matrix  factorization  for  the  direct 

algorithm  of  Example  5.28. 


® 

Check  for 
updates 

Chapter  6 
Equilibrium 


In  this  chapter,  we  will  apply  what  we  have  learned  so  far  to  the  analysis  of  equilibrium 
configurations  and  stability  of  mechanical  structures  and  electrical  networks.  Both  phys¬ 
ical  problems  fit  into  a  common,  and  surprisingly  general,  mathematical  framework.  The 
physical  laws  of  equilibrium  mechanics  and  circuits  lead  to  linear  algebraic  systems  whose 
coefficient  matrix  is  of  positive  (semi-)dehnite  Gram  form.  The  positive  definite  cases  corre¬ 
spond  to  stable  structures  and  networks,  which  can  support  any  applied  forcing  or  external 
current,  producing  a  unique,  stable  equilibrium  solution  that  can  be  characterized  by  an 
energy  minimization  principle.  On  the  other  hand,  systems  with  semi-deffilite  coefficient 
matrices  model  unstable  structures  and  networks  that  are  unable  to  remain  in  equilibrium 
except  under  very  special  configurations  of  external  forces.  In  the  case  of  mechanical  struc¬ 
tures,  the  instabilities  are  of  two  types:  rigid  motions,  in  which  the  structure  moves  while 
maintaining  its  overall  geometrical  shape,  and  mechanisms,  in  which  it  spontaneously  de¬ 
forms  in  the  absence  of  any  applied  force.  The  same  linear  algebra  framework,  but  now 
reformulated  for  infinite-dimensional  function  space,  also  characterizes  the  boundary  value 
problems  for  both  ordinary  and  partial  differential  equation  that  model  the  equilibria  of 
continuous  media,  including  bars,  beams,  solid  bodies,  and  many  other  systems  arising 
throughout  physics  and  engineering,  [61,79], 

The  starting  point  is  a  linear  chain  of  masses  interconnected  by  springs  and  constrained 
to  move  only  in  the  longitudinal  direction.  Our  general  mathematical  framework  is  already 
manifest  in  this  rather  simple  mechanical  system.  In  the  second  section,  we  discuss  simple 
electrical  networks  consisting  of  resistors,  current  sources  and/or  batteries,  interconnected 
by  a  network  of  wires.  Here,  the  resulting  Gram  matrix  is  known  as  the  graph  Laplacian, 
which  plays  an  increasingly  important  role  in  modern  data  analysis  and  network  theory. 
Finally,  we  treat  small  (so  as  to  remain  in  a  linear  modeling  regime)  displacements  of 
two-  and  three-dimensional  structures  constructed  out  of  elastic  bars.  In  all  cases,  we 
consider  only  the  equilibrium  solutions.  Dynamical  (time-varying)  processes  for  each  of 
these  physical  systems  are  governed  by  linear  systems  of  ordinary  differential  equations,  to 
be  formulated  and  analyzed  in  Chapter  10. 

6.1  Springs  and  Masses 

A  mass-spring  chain  consists  of  n  masses  m1?  ra2,  . . .  mn  arranged  in  a  straight  line.  Each 
mass  is  connected  to  its  immediate  neighbor(s)  by  springs.  Moreover,  the  chain  may  be 
connected  at  one  or  both  ends  to  a  fixed  support  by  a  spring  —  or  may  even  be  completely 
free,  e.g.,  floating  in  outer  space.  For  specificity,  let  us  first  look  at  the  case  when  both 
ends  of  the  chain  are  attached  to  unmoving  supports,  as  illustrated  in  Figure  6.1 

We  assume  that  the  masses  are  arranged  in  a  vertical  line,  and  order  them  from  top  to 
bottom.  For  simplicity,  we  will  only  allow  the  masses  to  move  in  the  vertical  direction,  that 
is,  we  restrict  to  a  one-dimensional  motion.  (Section  6.3  deals  with  the  more  complicated 
two-  and  three-dimensional  situations.) 
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Figure  6.1. 


A  Mass-Spring  Chain  with  Fixed  Ends. 


If  we  subject  some  or  all  of  the  masses  to  an  external  force,  e.g.,  gravity,  then  the  system 
will  move^  to  a  new  equilibrium  position.  The  resulting  position  of  the  ith  mass  is  measured 
by  its  displacement  ui  from  its  original  position,  which,  since  we  are  only  allowing  vertical 
motion,  is  a  scalar  quantity.  Referring  to  Figure  6.1,  we  use  the  convention  that  ui  >  0 
if  the  mass  has  moved  downwards,  and  ui  <  0  if  it  has  moved  upwards.  Our  goal  is  to 
determine  the  new  equilibrium  configuration  of  the  chain  under  the  prescribed  forcing,  that 
is,  to  set  up  and  solve  a  system  of  equations  for  the  displacements  u1: . . . ,  un. 

As  sketched  in  Figure  6.2,  let  e  ■  denote  the  elongation  of  the  jth  spring,  which  connects 
mass  m-_i  to  mass  m-.  By  “elongation”,  we  mean  how  far  the  spring  has  been  stretched, 
so  that  ej  > 0  if  the  spring  is  longer  than  its  reference  length,  while  ej  < 0  if  the  spring 
has  been  compressed.  The  elongations  of  the  internal  springs  can  be  determined  directly 
from  the  displacements  of  the  masses  at  each  end  according  to  the  geometric  formula 

e j  u j  Uj _ i ,  j  2, . . . ,  n,  (6.1) 

while,  for  the  top  and  bottom  springs, 


el  —  U L,  en+ 1  —  Un'> 

since  the  supports  are  not  allowed  to  move.  We  write  the 
matrix  form 


elongation  equations  (6.1-2)  in 


(  e  i  \ 

e2 

where  e  = 

\ ) 


is  the  elongation  vector ,  u 


uA 

u2 


is  the  displacement  vector ,  and 


/ 


^  The  differential  equations  governing  its  dynamical  behavior  during  the  motion  will  be  the 
subject  of  Chapter  10.  Damping  or  frictional  effects  will  cause  the  system  to  eventually  settle 
down  into  a  stable  equilibrium  configuration,  if  such  exists. 
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Figure  6.2.  Elongation  of  a  Spring. 


the  coefficient  matrix 


\ 


has  size  (n  +  1)  x  n,  with  only  its  non-zero  entries  being  indicated.  We  refer  to  A  as 
the  reduced  incidence  matrix t  for  the  mass-spring  chain.  The  incidence  matrix  effectively 
encodes  the  underlying  geometry  of  the  system,  including  the  fixed  “boundary  conditions” 
at  the  top  and  the  bottom. 

The  next  step  is  to  relate  the  elongation  e  ■  experienced  by  the  jth  spring  to  its  internal 
force  y-.  This  is  the  basic  constitutive  assumption ,  which  relates  geometry  to  kinematics. 
In  the  present  case,  we  suppose  that  the  springs  are  not  stretched  (or  compressed)  particu¬ 
larly  far.  Under  this  assumption,  Hooke’s  Law ,  named  in  honor  of  the  seventeenth-century 
English  scientist  and  inventor  Robert  Hooke,  states  that  the  internal  force  is  directly  pro¬ 
portional  to  the  elongation  —  the  more  you  stretch  a  spring,  the  more  it  tries  to  pull  you 
back.  Thus, 

Vj=cjej >  (6-5) 

where  the  constant  of  proportionality  c  ■  >  0  measures  the  spring’s  stiffness.  Hard  springs 
have  large  stiffness  and  so  takes  a  large  force  to  stretch,  whereas  soft  springs  have  a  small, 
but  still  positive,  stiffness.  We  will  also  write  the  constitutive  equations  (6.5)  in  matrix 
form 

y  =  Ce,  (6.6) 


t  The  connection  with  the  incidence  matrix  of  a  graph,  as  introduced  in  Section  2.6,  will  become 
evident  in  the  following  Section  6.2. 
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ml 
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2/i+i 


Figure  6.3.  Force  Balance. 


where 


\ 


\ 


\2/n+ 1  / 


V 


^n+1 


/ 


are  the  internal  force  vector  and  the  matrix  of  spring  stiffnesses.  Note  particularly  that  C 
is  a  diagonal  matrix,  and,  more  importantly,  positive  definite,  C  >  0,  since  all  its  diagonal 
entries  are  strictly  positive. 

Finally,  the  forces  must  balance  if  the  system  is  to  remain  in  equilibrium.  In  this 
simplified  model,  the  external  forces  act  only  on  the  masses,  and  not  on  the  springs.  Fet 
f-  denote  the  external  force  on  the  ith  mass  m-.  We  also  measure  force  in  the  downward 
direction,  so  fi  >  0  means  that  the  force  is  pulling  the  zth  mass  downward.  (In  particular, 
gravity  would  induce  a  positive  force  on  each  mass.)  If  the  zth  spring  is  stretched,  it  will 
exert  an  upward  force  on  m-,  while  if  the  (i  +  l)st  spring  is  stretched,  it  will  pull  mi 
downward.  Therefore,  the  balance  of  forces  on  mi  requires  that 


fi  =  Vi-Vi+v 

The  vectorial  form  of  the  force  balance  law  is 


where  f  =  /n)T.  The  remarkable  fact  is  that  the  force  balance  coefficient  matrix 


-1 

1  -1 
1 


\ 


is  the  transpose  of  the  reduced  incidence  matrix  (6.4)  for  the  chain.  This  connection 
between  geometry  and  force  balance  turns  out  to  be  of  almost  universal  applicability,  and 
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is  the  reason  underlying  the  positivity  of  the  final  coefficient  matrix  in  the  resulting  system 
of  equilibrium  equations. 

Summarizing,  the  basic  geometrical  and  physical  properties  of  our  mechanical  system 
lead  us  to  the  full  system  of  equilibrium  equations  (6.3,  6,  8)  relating  its  displacements  u, 
elongations  e,  internal  forces  y,  and  external  forces  f: 


e  =  Au,  y  =  Ce,  f  =  AT  y.  (6.10) 

These  equations  imply  f  =  AT  y  =  ATCe  =  ATC  A  u,  and  hence  can  be  combined  into  a 
single  linear  system 

K  n  =  f .  where  K  =  ATCA  (6.11) 

is  called  the  stiffness  matrix  associated  with  the  entire  mass-spring  chain.  In  the  particular 
case  under  consideration, 


/  °i  + 


Co  Ce 


K  = 


C2  C2  C3  C3 

—  c3  c3  +  c4 


\ 


c4  c4  +  c5  c5 


(6.12) 


\ 


At — i  At — i  ffi  At  At 

“At  At  +  At+i/ 


has  a  very  simple  symmetric,  tridiagonal  form.  As  such,  we  can  use  the  tridiagonal  solution 
algorithm  of  Section  1.7  to  rapidly  solve  the  linear  system  (6.11)  for  the  displacements  of 
the  masses.  Once  we  have  solved  (6.11)  for  the  displacements  u  we  can  then  compute  the 
resulting  elongations  e  and  internal  forces  y  by  substituting  into  the  original  system  (6.10). 


Let  us  consider  the  particular  case  of  n  —  3  masses  connected  by  identical 
spring  constant.  Thus,  cx  =  c2  =  c3  =  cA  =  1,  and  C  =  diag  (1, 1, 1, 1)  =  I 
is  the  4x4  identity  matrix.  The  3x3  stiffness  matrix  is  then 


Example  6.1. 

springs  with  unit 


/  1  — 1  0 
K  =  AtA  =  0  1-1 

\0  0  1 


A  straightforward  Gaussian  Elimination  produces  the  K  =  LDLT  factorization 

/  2  -1  0\  /  1  0  0\  /2  0  0\  /l  0\ 

-1  2-l  =  -±  1  0  0  |  0  0  1  -f  •  (6.13) 

V  0  -1  2 j  V  0  "I  V  \°  o  1/  \0  0  l) 

With  this  in  hand,  we  can  solve  the  basic  equilibrium  equations  Ku  =  f  by  the  usual 
Forward  and  Back  Substitution  algorithm. 


Remark.  Even  though  we  construct  K  —  ATC  A  and  then  factor  it  as  K  —  LDLT ,  there 
is  no  direct  algorithm  to  get  from  A  and  C  to  L  and  A),  which,  typically,  are  matrices  of 
different  sizes. 

Suppose,  for  example,  we  pull  the  middle  mass  downwards  with  a  unit  force,  so  /2  =  1 
while  A  =  /3  =  0.  Then  f  =  (0,1,0)T,  and  the  solution  to  the  equilibrium  equations 
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T 

(6.11)  is  u  =  ( |,  1,  \  )  ,  whose  entries  prescribe  the  mass  displacements.  Observe  that 
all  three  masses  have  moved  down,  with  the  middle  mass  moving  twice  as  far  as  the  other 
two.  The  corresponding  spring  elongations  and  internal  forces  are  obtained  by  matrix 
multiplication 


A 


y  =  c  =  Au  = 


V 


1 
2 
1 

2 

\t 


since  C  —  I .  Thus,  the  top  two  springs  are  elongated,  while  the  bottom  two  are  compressed, 
all  by  an  equal  amount. 

Similarly,  if  all  the  masses  are  equal,  m1  =  m2  =  m3  =  m,  then  the  solution  under  a 


constant  downwards  gravitational  force  f  =  (mg,mg,mg)T  is 


u  =  K 


(  mg\ 

rri  g 
\mg  / 


/  |  mg\ 
2  mg 

Vf  mg/ 


and 


y  =  e  =  Au  — 


V- 


|  mg\ 

^ mg 
\mg 
\mg  ) 


Now,  the  middle  mass  has  only  moved  33%  farther  than  the  others,  whereas  the  top 
and  bottom  springs  are  experiencing  three  times  as  much  elongation/compression  as  the 
middle  two  springs. 

An  important  observation  is  that  we  cannot  determine  the  internal  forces  y  or  elon¬ 
gations  e  directly  from  the  force  balance  law  (6.8),  because  the  transposed  matrix  AT  is 
not  square,  and  so  the  system  f  =  AT y  does  not  have  a  unique  solution.  We  must  first 
compute  the  displacements  u  by  solving  the  full  equilibrium  equations  (6.11),  and  then 
use  the  resulting  displacements  to  reconstruct  the  elongations  and  internal  forces.  Such 
systems  are  referred  to  as  statically  indeterminate. 

The  behavior  of  the  system  will  depend  on  both  the  forcing  and  the  boundary  conditions. 
Suppose,  by  way  of  contrast,  that  we  fix  only  the  top  of  the  chain  to  a  support,  and  leave 
the  bottom  mass  hanging  freely,  as  in  Figure  6.4.  The  geometric  relation  between  the 
displacements  and  the  elongations  has  the  same  form  (6.3)  as  before,  but  the  reduced 
incidence  matrix  is  slightly  altered: 


/ 


1 

-1  1 


(6.14) 


V  -l  1/ 

This  matrix  has  size  nxn  and  is  obtained  from  the  preceding  example  (6.4)  by  eliminating 
the  last  row  corresponding  to  the  missing  bottom  spring.  The  constitutive  equations  are 
still  governed  by  Hooke’s  law  y  =  Ce,  as  in  (6.6),  with  C  —  diag  (c1? . . . ,  cn)  the  nxn 
diagonal  matrix  of  spring  stiffnesses.  Finally,  the  force  balance  equations  are  also  found 
to  have  the  same  general  form  f  =  AT y  as  in  (6.8),  but  with  the  transpose  of  the  revised 
incidence  matrix  (6.14).  In  conclusion,  the  equilibrium  equations  K u  =  f  have  an  identical 
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Figure  6.4.  A  Mass-Spring  Chain  with  One  Free  End. 


form  (6.11),  based  on  the  revised 


stiffness  matrix 


/  M  +  c2  -  c. 


K  =  AtCA  = 


c2  +  c 
—  c< 


\ 


3 

c3  +  c4 


—  a 


c4  c4  +  c5  c5 


V 


Oi  —  1  Oi  —  1 


n 


n 


(6.15) 


n 


J 


Only  the  bottom  right  entry  differs  from  the  fixed  end  matrix  (6.12). 

This  system  is  called  statically  determinate ,  because  the  incidence  matrix  A  is  square 
and  nonsingular,  and  so  it  is  possible  to  solve  the  force  balance  law  (6.8)  directly  for  the 
internal  forces  y  =  A~T f  without  having  to  solve  the  full  equilibrium  equations  for  the 
displacements  u  before  computing  the  internal  forces  y  =  Ciu. 


Example  6.2.  For  a  three  mass  chain  with  one  free  end  and  equal  unit  spring  constants 
ci  =  c2  =  c3  ~  1,  the  stiffness  matrix  is 

/ 1  — 1  0W  1  0  0\  /  2  -1  0\ 

K  =  AtA  =  0  1-1-1  1  0  ]  =  -1  2  -1  . 

\0  0  l)  \  0  -1  1  /  \  0  — 1  1 J 

Pulling  the  middle  mass  downwards  with  a  unit  force,  whereby  f  =  (0,1,0)  ,  results  in 
the  displacements 


In  this  configuration,  the  bottom  two  masses  have  moved  by  the  same  amount,  which  is 
twice  as  far  as  the  top  mass.  Because  we  are  pulling  only  on  the  middle  mass,  the  bottom 
spring  hangs  free  and  experiences  no  elongation,  whereas  the  top  two  springs  are  stretched 
by  the  same  amount. 

Similarly,  for  a  chain  of  equal  masses  subject  to  a  constant  downwards  gravitational 
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force  f  —  ( mg.  mg.  mg ) 


T 


the  equilibrium  position  is 


Note  how  much  farther  the  masses  have  moved  now  that  the  restraining  influence  of  the 
bottom  support  has  been  removed.  The  top  spring  is  experiencing  the  most  elongation, 
and  is  thus  the  most  likely  to  break,  because  it  must  support  all  three  masses. 


Exercises 


6.1.1.  A  mass-spring  chain  consists  of  two  masses  connected  to  two  fixed  supports.  The  spring 
constants  are  c1  =  c3  =  1  and  c2  =  2.  (a)  Find  the  stiffness  matrix  K.  (b)  Solve  the 

equilibrium  equations  K u  =  f  when  f  =  (4,3)T  .  (c)  Which  mass  moved  the  farthest? 

(d)  Which  spring  has  been  stretched  the  most?  Compressed  the  most? 

6.1.2.  Solve  Exercise  6.1.1  when  the  first  and  second  springs  are  interchanged,  c1  =  2, 
c2  =  c3  =  1.  Which  of  your  conclusions  changed? 

6.1.3.  Redo  Exercises  6. 1.1-2  when  the  bottom  support  and  spring  are  removed. 

6.1.4.  A  mass-spring  chain  consists  of  four  masses  suspended  between  two  fixed  supports. 

The  spring  stiffnesses  are  c1  =  1,  c2  =  c3  =  |,  c4  =  c5  =  1.  (a)  Determine  the 
equilibrium  positions  of  the  masses  and  the  elongations  of  the  springs  when  the  external 
force  isf  =  (0,l,l,0)T.  Is  your  solution  unique?  (b)  Suppose  we  fix  only  the  top  support. 
Solve  the  problem  with  the  same  data  and  compare  your  results. 


6.1.5.  (a)  Show  that,  in  a  mass-spring  chain  with  two  fixed  ends,  under  any  external  force,  the 
average  elongation  of  the  springs  is  zero:  — (el  +  •  •  •  +  en+1)  =  0-  (b)  What  can  you  say 
about  the  average  elongation  of  the  springs  in  a  chain  with  one  fixed  end? 


0  6.1.6.  Suppose  we  subject  the  mass  (and  no  others)  in  a  chain  to  a  unit  force,  and  then 
measure  the  resulting  displacement  of  the  jth  mass.  Prove  that  this  is  the  same  as  the 
displacement  of  the  mass  when  the  chain  is  subject  to  a  unit  force  on  the  jth  mass. 
Hint :  See  Exercise  1.6.20. 


X  6.1.7.  Find  the  displacements  u1?u2, . . . ,  u100  of  100  masses  connected  in  a  row  by  identical 
springs,  with  spring  constant  c  =  1.  Consider  the  following  three  types  of  force  functions: 

(a)  Constant  force:  f1  =  •  •  •  =  /100  =  .01;  (b)  Linear  force:  f-  =  .0002  z;  (c)  Quadratic 

r* 

force:  fi  =  6-  10  i  (100  —  i).  Also  consider  two  different  boundary  conditions  at  the 
bottom:  ( i )  spring  101  connects  the  last  mass  to  a  support;  (ii)  mass  100  hangs  free  at  the 
end  of  the  line  of  springs.  Graph  the  displacements  and  elongations  in  all  six  cases.  Discuss 
your  results;  in  particular,  comment  on  whether  they  agree  with  your  physical  intuition. 

6.1.8.  (a)  Suppose  you  are  given  three  springs  with  respective  stiffnesses  c  =  1,  c  =  2,  c"  =  3. 
In  what  order  should  you  connect  them  to  three  masses  and  a  top  support  so  that  the 
bottom  mass  goes  down  the  farthest  under  a  uniform  gravitational  force? 

(b)  Answer  Exercise  6.1.8  when  the  springs  connect  two  masses  to  top  and  bottom  supports. 

X  6.1.9.  Generalizing  Exercise  6.1.8,  suppose  you  are  given  n  different  springs,  (a)  In  which 
order  should  you  connect  them  to  n  masses  and  a  top  support  so  that  the  bottom  mass 
goes  down  the  farthest  under  a  uniform  gravitational  force?  Does  your  answer  depend  upon 
the  relative  sizes  of  the  spring  constants?  (b)  Answer  the  same  question  when  the  springs 
connect  n  —  1  masses  to  both  top  and  bottom  supports. 
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6.1.10.  Find  the  LDLT  factorization  of  an  n  x  n  tridiagonal  matrix  whose  diagonal  entries  are 
all  equal  to  2  and  whose  sub-  and  super-diagonal  entries  are  all  equal  to  —1.  Hint :  Start 

with  the  3x3  case  (6.13),  and  then  analyze  a  slightly  larger  one  to  spot  the  pattern. 

T  6.1.11.  In  a  statically  indeterminate  situation,  the  equations  AT y  =  f  do  not  have  a  unique 
solution  for  the  internal  forces  y  in  terms  of  the  external  forces  f.  (a)  Prove  that, 
nevertheless,  if  C  =  I ,  the  internal  forces  are  the  unique  solution  of  minimal  Euclidean 
norm,  as  given  by  Theorem  4.50.  (b)  Use  this  method  to  directly  find  the  internal  force 

for  the  system  in  Example  6.1.  Make  sure  that  your  values  agree  with  those  in  the  example. 


Positive  Definiteness  and  the  Minimization  Principle 

You  may  have  already  observed  that  the  stiffness  matrix  K  —  ATC  A  of  a  mass-spring 
chain  has  the  form  of  a  Gram  matrix,  cf.  (3.64),  for  the  weighted  inner  product  ( v ,  w)  = 
vTC  w  induced  by  the  diagonal  matrix  of  spring  stiffnesses.  Moreover,  since  A  has  linearly 
independent  columns  (which  should  be  checked),  and  C  is  positive  definite,  Theorem  3.37 
tells  us  that  the  stiffness  matrix  is  positive  definite:  K  >  0.  In  particular,  Theorem  3.43 
guarantees  that  K  is  nonsingular,  and  hence  the  linear  system  (6.11)  has  a  unique  solution 
u  =  K~l  f.  We  can  therefore  conclude  that  the  mass-spring  chain  assumes  a  unique 
equilibrium  position  under  an  arbitrary  external  force.  However,  one  must  keep  in  mind 
that  this  is  a  mathematical  result  and  may  not  hold  in  all  physical  situations.  Indeed,  we 
should  anticipate  that  a  very  large  force  will  take  us  outside  the  regime  covered  by  the 
linear  Hooke’s  law  relation  (6.5),  and  render  our  simple  mathematical  model  physically 
irrelevant . 

According  to  Theorem  5.2,  when  the  coefficient  matrix  of  a  linear  system  is  positive 
definite,  the  equilibrium  solution  can  be  characterized  by  a  minimization  principle.  For 
mass-spring  chains,  the  quadratic  function  to  be  minimized  has  a  physical  interpreta¬ 
tion:  it  is  the  potential  energy  of  the  system.  Nature  is  parsimonious  with  energy,  so  a 
physical  system  seeks  out  an  energy-minimizing  equilibrium  configuration.  Energy  min¬ 
imization  principles  are  of  almost  universal  validity,  and  can  be  advantageously  used  for 
the  construction  of  mathematical  models,  as  well  as  their  solutions,  both  analytical  and 
numerical. 

The  energy  function  to  be  minimized  can  be  determined  directly  from  physical  prin¬ 
ciples.  For  a  mass-spring  chain,  the  potential  energy  of  the  zth  mass  equals  the  product 
of  the  applied  force  and  the  displacement:  —fiui.  The  minus  sign  is  the  result  of  our 
convention  that  a  positive  displacement  ui  >  0  means  that  the  mass  has  moved  down, 
and  hence  decreased  its  potential  energy.  Thus,  the  total  potential  energy  due  to  external 
forcing  on  all  the  masses  in  the  chain  is 

n 

~  Y  =  _uTf' 

i  =  1 

Next,  we  calculate  the  internal  energy  of  the  system.  In  a  single  spring  elongated  by  an 
amount  e,  the  work  done  by  the  internal  forces  y  =  ce  is  stored  as  potential  energy,  and 
so  is  calculated  by  integrating  the  force  over  the  elongated  distance: 

e  re 

y  de  =  /  cede  =  ^  ce2. 
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Totaling  the  contributions  from  each  spring,  we  find  the  internal  spring  energy  to  be 


i  n 

-  cz  e?  —  \  eTCe  =  \  uT At CAu  =  \  uTiEu, 
i  —  1 

where  we  used  the  incidence  equation  e  =  4u  relating  elongation  and  displacement.  There¬ 
fore,  the  total  potential  energy  is 

p(u)  =  \  uTltu  —  uTf .  (6.16) 


Since  K  >  0,  Theorem  5.2  implies  that  this  quadratic  function  has  a  unique  minimizer 
that  satisfies  the  equilibrium  equation  ifu  =  f. 

Example  6.3.  For  the  three  mass  chain  with  two  fixed  ends  described  in  Example  6.1, 
the  potential  energy  function  (6.16)  has  the  explicit  form 


=  u\-u1u2  +  u\  -  u2u3  +  uj  -  f -  f2u2  -  f3U3, 
where  f  =  (  /i,  /2,  )T  is  the  external  forcing.  The  minimizer  of  this  particular  quadratic 

T 

function  gives  the  equilibrium  displacements  u  =  ( u1:  u2,  u3  )  of  the  three  masses. 


Exercises 

6.1.12.  Prove  directly  that  the  stiffness  matrices  in  Examples  6.1  and  6.2  are  positive  definite. 

6.1.13.  Write  down  the  potential  energy  for  the  following  mass-spring  chains  with  identical 
unit  springs  when  subject  to  a  uniform  gravitational  force:  (a)  three  identical  masses 
connected  to  only  a  top  support,  (b)  four  identical  masses  connected  to  top  and  bottom 
supports,  (c)  four  identical  masses  connected  only  to  a  top  support. 

6.1.14.  (a)  Find  the  total  potential  energy  of  the  equilibrium  configuration  of  the  mass-spring 
chain  in  Exercise  6.1.1.  (b)  Test  the  minimum  principle  by  substituting  three  other  possible 
displacements  of  the  masses  and  checking  that  they  all  have  larger  potential  energy. 

6.1.15.  Answer  Exercise  6.1.14  for  the  mass-spring  chain  in  Exercise  6.1.4. 

6.1.16.  Describe  the  mass-spring  chains  that  gives  rise  to  the  following  potential  energy 
functions,  and  find  their  equilibrium  configuration:  (a)  3 u\  —  Au1  u2  +  3u2  +  u1  —3 a2, 

(b)  bu1  6u1  u2  — 3 — |—  2 u2 ,  (c)  2 u-^  3u-^  - (—  /-^u2  -I-  2  a^  a — f-  a^, 

(d)  2  a1  —  a1  u2  u2  —  ^2  ^3  4~  ^3  —  ^3  ^4  T  2  a^  a-^  —  2  a^ . 

6.1.17.  Explain  why  the  columns  of  the  reduced  incidence  matrices  (6.4)  and  (6.14)  are  linearly 
independent. 

6.1.18.  Suppose  that  when  subject  to  a  nonzero  external  force  f  /  0,  a  mass-spring  chain  has 
equilibrium  position  u*.  Prove  that  the  potential  energy  is  strictly  negative  at  equilibrium: 
p( u*)  <  0. 

T  6.1.19.  Return  to  the  situation  investigated  in  Exercise  6.1.8.  How  should  you  arrange  the 
springs  in  order  to  minimize  the  potential  energy  in  the  resulting  mass-spring  chain? 

6.1.20.  True  or  false:  The  potential  energy  function  uniquely  determines  the  mass-spring 
chain. 
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Figure  6.5.  A  Simple  Electrical  Network. 

6.2  Electrical  Networks 

By  an  electrical  network ,  we  mean  a  collection  of  (insulated)  wires  that  are  joined  together 
at  their  ends.  The  junctions  connecting  the  ends  of  one  or  more  wires  are  called  nodes. 
Mathematically,  we  can  view  any  such  electrical  network  as  a  graph,  the  wires  being  the 
edges  and  the  nodes  the  vertices.  As  before,  to  avoid  technicalities,  we  will  assume  that 
the  underlying  graph  is  simple ,  meaning  that  there  are  no  loops  and  at  most  one  edge 
connecting  any  two  vertices.  To  begin  with,  we  further  assume  that  there  are  no  electrical 
devices  (batteries,  inductors,  capacitors,  etc.)  in  the  network,  and  so  the  only  impediments 
to  the  current  flowing  through  the  network  are  the  resistances  in  the  wires.  As  we  shall 
see,  resistance  (or  rather  its  reciprocal)  plays  a  very  similar  role  to  that  of  spring  stiffness. 
Thus,  the  network  corresponds  to  a  weighted  graph  in  which  the  weight  of  an  edge  is  the 
number  representing  the  resistance  of  the  corresponding  wire.  We  shall  feed  a  current  into 
the  network  at  one  or  more  of  the  nodes,  and  would  like  to  determine  how  the  induced 
current  flows  through  the  wires.  The  basic  equations  governing  the  equilibrium  voltages  and 
currents  in  such  a  network  follow  from  the  three  fundamental  laws  of  electricity,  named 
after  the  pioneering  nineteenth-century  German  physicists  Gustav  Kirchhoff  and  Georg 
Ohm,  two  of  the  founders  of  electric  circuit  theory,  [58]. 

Voltage  is  defined  as  the  electromotive  force  that  moves  electrons  through  a  wire.  An 
individual  wire’s  voltage  is  determined  by  the  difference  in  the  voltage  potentials  at  its  two 
ends  —  just  as  the  gravitational  force  on  a  mass  is  induced  by  a  difference  in  gravitational 
potential.  To  quantify  voltage,  we  need  to  fix  an  orientation  for  the  wire.  A  positive  voltage 
will  mean  that  the  electrons  move  in  the  chosen  direction,  while  a  negative  voltage  causes 
them  to  move  in  reverse.  The  original  choice  of  orientation  is  arbitrary,  but  once  assigned 
will  pin  down  the  sign  conventions  to  be  used  by  voltages,  currents,  etc.  To  this  end,  we 
draw  a  digraph  to  represent  the  network,  whose  edges  represent  wires  and  whose  vertices 
represent  nodes.  Each  edge  is  assigned  an  orientation  that  indicates  the  wire’s  starting  and 
ending  nodes.  A  simple  example  consisting  of  five  wires  joined  at  four  different  nodes  can 
be  seen  in  Figure  6.5.  The  arrows  indicate  the  selected  directions  for  the  wires,  the  wavy 
lines  are  the  standard  electrical  symbols  for  resistance,  while  the  resistances  provide  the 
edge  weights  in  the  resulting  weighted  digraph. 

In  an  electrical  network,  each  node  will  have  a  voltage  potential,  denoted  by  ui.  If  wire 
k  starts  at  node  i  and  ends  at  node  j  under  its  assigned  orientation,  then  its  voltage  vk 
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equals  the  difference  between  the  voltage  potentials  at  its  ends: 

vk  —  ui  —  Uj.  (6-17) 

Note  that  vk  >  0  if  ui  >  indicating  that  the  electrons  flow  from  the  starting  node  i 
to  the  ending  node  j.  In  our  particular  illustrative  example,  the  five  wires  have  respective 
voltages 


v1=u1-u  2,  v2  = 
Let  us  rewrite  this  linear 


U  i  'Uqi  Ug  —  'LL- ^  ,  ^4  —  ^2  U<4 , 

system  of  equations  in  vector  form 


where 


A  = 


1 
1 
0 
\0 


-1 

0 

0 

1 

0 


0 

-1 

0 

0 

1 


(6.18) 


(6.19) 


The  alert  reader  will  recognize  the  incidence  matrix  (2.46)  for  the  digraph  defined  by  the 
network.  This  is  true  in  general  —  the  voltages  along  the  wires  of  an  electrical  network  are 
related  to  the  potentials  at  the  nodes  by  a  linear  system  of  the  form  (6.18),  in  which  A  is 
the  incidence  matrix  of  the  network  digraph.  The  rows  of  the  incidence  matrix  are  indexed 
by  the  wires,  and  the  columns  by  the  nodes.  Each  row  of  the  matrix  A  has  a  single  + 1 
in  the  column  indexed  by  the  starting  node  of  the  associated  wire,  and  a  single  —  1  in  the 
column  of  the  ending  node. 

Kirchhoff ’s  Voltage  Law  states  that  the  sum  of  the  voltages  around  each  closed  circuit 
in  the  network  is  zero.  For  example,  in  the  network  under  consideration,  summing  the 
voltages  around  the  left-hand  triangular  circuit  gives 


V\  +  v4  —  ^3  —  {ul  —  ^2)  +  (u2  —  Ua)  —  (U1  —  Ua)  ~  6. 

Note  that  v3  appears  with  a  minus  sign,  since  we  must  traverse  wire  3  in  the  opposite 
direction  to  its  assigned  orientation  when  going  around  the  circuit  in  the  counterclockwise 
direction.  The  voltage  law  is  a  direct  consequence  of  (6.18).  Indeed,  as  discussed  in 
Section  2.6,  the  circuits  can  be  identified  with  vectors  £  E  coker  A  —  ker  AT  in  the  cokernel 
of  the  incidence  matrix,  and  so 

i  -  x  =  lTv  =  f,TAu  =  0.  (6.20) 


Therefore,  orthogonality  of  the  voltage  vector  v  to  the  circuit  vector  £  is  the  mathematical 
formalization  of  Kirchhoff  ’s  Voltage  Law. 

Given  a  prescribed  set  of  voltages  v  along  the  wires,  can  one  find  corresponding  voltage 
potentials  u  at  the  nodes?  To  answer  this  question,  we  need  to  solve  v  =  4u,  which 
requires  v  E  img  A  According  to  the  Fredholm  Alternative  Theorem  4.46,  the  necessary 
and  sufficient  condition  for  this  to  hold  is  that  v  be  orthogonal  to  coker  A.  Theorem  2.53 
says  that  the  cokernel  of  an  incidence  matrix  is  spanned  by  the  circuit  vectors,  and  so  v  is 
a  possible  set  of  voltages  if  and  only  if  v  is  orthogonal  to  all  the  circuit  vectors  £  E  coker  A, 
i.e.,  the  Voltage  Law  is  necessary  and  sufficient  for  the  given  voltages  to  be  physically 
realizable  in  the  network. 

Kirchhoff ’s  Law  is  related  to  the  topology  of  the  network  —  how  the  different  wires  are 
connected  together.  Ohm’s  Law  is  a  constitutive  relation,  indicating  what  the  wires  are 
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made  of.  The  resistance  along  a  wire  (including  any  added  resistors)  prescribes  the  relation 
between  voltage  and  current  or  the  rate  of  flow  of  electric  charge.  The  law  reads 


Vk  ~  RkVki  (6.21) 

where  vk  is  the  voltage,  Rk  is  the  resistance,  and  yk  (often  denoted  by  Ik  in  the  engineering 
literature)  denotes  the  current  along  wire  k.  Thus,  for  a  fixed  voltage,  the  larger  the 
resistance  of  the  wire,  the  smaller  the  current  that  flows  through  it.  The  direction  of 
the  current  is  also  prescribed  by  our  choice  of  orientation  of  the  wire,  so  that  yk  >  0  if 
the  current  is  flowing  from  the  starting  to  the  ending  node.  We  combine  the  individual 
equations  (6.21)  into  a  single  vector  equation 

v  =  Ry ,  (6.22) 

where  the  resistance  matrix  R  —  diag  (i?1? . . . ,  Rn)  >  0  is  diagonal  and  positive  definite. 
We  shall,  in  analogy  with  (6.6),  replace  (6.22)  by  the  inverse  relationship 


(6.23) 


where  C  =  R  1  is  the  conductance  matrix ,  again  diagonal,  positive  definite,  whose  entries 
are  the  conductances  ck  —  1/Rk  of  the  wires.  For  the  particular  network  in  Figure  6.5, 


/1/R1  0  0  0 

0  1/R2  0  0 

0  0  1/RS  0 

0  0  0  l/i?4 

\  0  0  0  0 


(6.24) 


Finally,  we  stipulate  that  electric  current  is  not  allowed  to  accumulate  at  any  node,  i.e., 
every  electron  that  arrives  at  a  node  must  leave  along  one  of  the  wires.  Let  yk,  yl:  . . . ,  yrn 
denote  the  currents  along  all  the  wires  /c,  /,...,  m  that  meet  at  node  i  in  the  network,  and 
f%  an  external  current  source,  if  any,  applied  at  node  i.  Kirchhoff’s  Current  Law  requires 
that  the  net  current  leaving  the  node  along  the  wires  equals  the  external  current  coming 
into  the  node,  and  so 

±  Vk  ±  Vi  ±  ■  ■  •  ±  Dm  =  /»•  (6.25) 

Each  ±  sign  is  determined  by  the  orientation  of  the  wire,  with  +  if  node  i  is  its  starting 
node  and  —  if  it  is  its  ending  node. 

In  our  particular  example,  suppose  that  we  send  a  1  amp  current  source  into  the  first 
node.  Then  Kirchhoff’s  Current  Law  requires 


yi  +  y2  +  y3  =  1>  -yi  +  y4  =  °>  -2/2  +  j/5  =  °>  —  s/3  —  2/4  —  ^5  =  °^ 


the  four  equations  corresponding  to  the  four  nodes  in  our  network.  The  vector  form  of  this 
linear  system  is 

AT  y  =  f,  (6.26) 

T  T 

where  y  =  (y1,y2,y3,y4:,y5)  are  the  currents  along  the  five  wires,  and  f  =  (1,0, 0,0) 
represents  the  current  sources  at  the  four  nodes.  The  coefficient  matrix 


/  1  1  1  0 

-10  0  1 

0-100 
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(6.27) 
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is  the  transpose  of  the  incidence  matrix  (6.19).  As  in  the  mass-spring  chain,  this  is  a 
remarkable  general  fact,  which  follows  directly  from  Kirchhoff’s  two  laws.  The  coefficient 
matrix  for  the  Current  Law  is  the  transpose  of  the  incidence  matrix  for  the  Voltage  Law. 
Let  us  assemble  the  full  system  of  equilibrium  equations  (6.18,  23,  26): 

v  =  An.  y  =  Cv,  f  =  ATy.  (6.28) 

Remarkably,  we  arrive  at  a  system  of  linear  relations  that  has  an  identical  form  to  the 
mass-spring  chain  system  (6.10),  albeit  with  different  physical  quantities  and  different 
coefficient  matrices.  As  before,  they  combine  into  a  single  linear  system 

An  =  f.  where  K  =  ATCA  (6.29) 

is  known  as  the  resistivity  matrix  associated  with  the  network.  In  our  particular  example, 
combining  (6.19,  24,  27)  produces  the  resistivity  matrix 


K  =  ATC  A 


(ci 


+  c2  +  c3 


-Cl 

C2 

c1  +  c4 

0 

0 

c2  +  c 

-C4 

~C5 

c3  +  c4  +  c5  / 


(6.30) 


whose  entries  depend  on  the  conductances  of  the  five  wires  in  the  network. 

Remark.  There  is  a  simple  pattern  to  the  resistivity  matrix,  evident  in  (6.30).  The 
diagonal  entries  ku  equal  the  sum  of  the  conductances  of  all  the  wires  having  node  i  at  one 
end.  The  non-zero  off-diagonal  entries  ktJ ,  i  j,  equal  —  ck:  the  conductance  of  the  wire"*" 
joining  node  i  to  node  j,  while  ktJ  =  0  if  there  is  no  wire  joining  the  two  nodes. 

Consider  the  case  in  which  all  the  wires  in  our  network  have  equal  unit  resistance,  and 
so  ck  =  1/Rk  =  1  for  k  —  1, . . . ,  5.  Then  the  resistivity  matrix  is 
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-1 


-1 

-1 

3/ 


(6.31) 


However,  when  trying  to  solve  the  linear  system  (6.29),  we  run  into  an  immediate  difficulty: 
there  is  no  solutionl  The  matrix  (6.31)  is  not  positive  definite  —  it  is  a  singular  matrix. 
Moreover,  the  particular  current  source  vector  f  =  (1,0,  0,0)  does  not  he  in  the  image 
of  K.  Something  is  clearly  amiss. 

Before  getting  discouraged,  let  us  sit  back  and  use  a  little  physical  intuition.  We  are 
trying  to  put  a  1  amp  current  into  the  network  at  node  1.  Where  can  the  electrons  go? 
The  answer  is  nowhere  —  they  are  all  trapped  in  the  network  and,  as  they  accumulate, 
something  drastic  will  happen  —  sparks  will  fly!  This  is  clearly  an  unstable  situation,  and 
so  the  fact  that  the  equilibrium  equations  do  not  have  a  solution  is  trying  to  tell  us  that 
the  physical  system  cannot  remain  in  a  steady  state.  The  physics  rescues  the  mathematics, 
or,  vice  versa,  the  mathematics  elucidates  the  underlying  physical  processes. 

In  order  to  achieve  equilibrium  in  an  electrical  network,  we  must  remove  as  much  current 
as  we  put  in.  Thus,  if  we  feed  a  1  amp  current  into  node  1,  then  we  must  extract  a  total  of 


This  assumes  that  there  is  only  one  wire  joining  the  two  nodes. 
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1  amp’s  worth  of  current  from  the  other  nodes.  In  other  words,  the  sum  of  all  the  external 
current  sources  must  vanish: 


/l  +  A  +  "  '  +  fn  — 


and  so  there  is  no  net  current  being  fed  into  the  network.  Suppose  we  also  extract  a  1  amp 
current  from  node  4;  then  the  modified  current  source  vector  f  =  (1,0,  0,-1)  indeed  lies 
in  the  image  of  AT,  as  you  can  check,  and  the  equilibrium  system  (6.29)  has  a  solution. 

This  is  all  well  and  good,  but  we  are  not  out  of  the  woods  yet.  As  we  know,  if  a  linear 
system  has  a  singular  coefficient  matrix,  then  either  it  has  no  solutions  —  the  case  we 
already  rejected  —  or  it  has  infinitely  many  solutions  —  the  case  we  are  considering  now. 
In  the  particular  network  under  consideration,  the  general  solution  to  the  linear  system 
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is  found  by  Gaussian  Elimination: 
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(6.32) 


where  t  —  u4  is  the  free  variable.  The  resulting  nodal  voltage  potentials 


U\  —  9  +  U2  —  A  +  ^ 


u. 


—  j  T  £,  uA  —  £, 


depend  on  a  free  parameter  t. 

The  ambiguity  arises  because  voltage  potential  is  a  mathematical  abstraction  that  can¬ 
not  be  measured  directly;  only  relative  potential  differences  have  physical  import.  To 
resolve  the  inherent  ambiguity,  we  need  to  assign  a  baseline  value  for  the  voltage  poten¬ 
tials.  In  terrestrial  electricity,  the  Earth  is  assumed  to  have  zero  potential.  Specifying  a 
particular  node  to  have  zero  potential  is  physically  equivalent  to  grounding  that  node.  For 
our  example,  suppose  we  ground  node  4  by  setting  uA  —  0.  This  fixes  the  free  variable  t  =  0 
in  our  solution  (6.32),  and  so  uniquely  specifies  all  the  other  voltage  potentials:  u1  = 

U2  =  b  u3  =  b  U4  = 

On  the  other  hand,  even  without  specification  of  a  baseline  potential  level,  the  cor¬ 
responding  physical  voltages  and  currents  along  the  wires  are  uniquely  specified.  In  our 
example,  computing  y  =  v  =  Aw  gives 

Vi  =  vi  =  b  V2  =  v2  =  b  V3  =  v3  =  b  Va  =  v  4  =  h  %  =  vs  =  i  > 

independent  of  the  value  of  t  in  (6.32).  Thus,  the  nonuniqueness  of  the  voltage  potential 
solution  u  is  an  inessential  feature.  All  physical  quantities  that  we  can  measure  —  currents 
and  voltages  —  are  uniquely  specified  by  the  solution  to  the  equilibrium  system. 

Remark.  Although  they  have  no  real  physical  meaning,  we  cannot  dispense  with  the 
nonmeasurable  (and  nonunique)  voltage  potentials  u.  Most  networks  are  statically  inde¬ 
terminate ,  since  their  incidence  matrices  are  rectangular  and  hence  not  invertible,  so  the 
linear  system  ATy  =  f  cannot  be  solved  directly  for  the  currents  in  terms  of  the  voltage 
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sources  since  the  system  does  not  have  a  unique  solution.  Only  by  first  solving  the  full 
equilibrium  system  (6.29)  for  the  potentials,  and  then  using  the  relation  y  =  CAu  between 
the  potentials  and  the  currents,  can  we  determine  their  actual  values. 

Let  us  analyze  what  is  going  on  in  the  context  of  our  general  mathematical  framework. 
Proposition  3.36  says  that  the  resistivity  matrix  K  =  ATCA  is  a  positive  semi-definite 
Gram  matrix,  which  is  positive  definite  (and  hence  nonsingular)  if  and  only  if  A  has 
linearly  independent  columns,  or,  equivalently,  ker  A  =  {0}.  But  Proposition  2.51  says 
that  the  incidence  matrix  A  of  a  directed  graph  never  has  a  trivial  kernel.  Therefore,  the 
resistivity  matrix  K  is  only  positive  semi-definite,  and  hence  singular.  If  the  network  is 
connected,  then  ker  A  =  ker  K  =  coker  K  is  one-dimensional,  spanned  by  the  vector  z  = 
(1,1,1,...,1)  .  According  to  the  Fredholm  Alternative  Theorem  4.46,  the  fundamental 
network  equation  Ku  —  f  has  a  solution  if  and  only  if  f  is  orthogonal  to  coker  K,  and  so 
the  current  source  vector  must  satisfy 


z  '  f  —  fl  +  +  *  *  *  +  fn  —  0, 


(6.33) 


as  we  already  observed.  Therefore,  the  linear  algebra  reconfirms  our  physical  intuition:  a 
connected  network  admits  an  equilibrium  configuration,  obtained  by  solving  (6.29),  if  and 
only  if  the  nodal  current  sources  add  up  to  zero,  i.e.,  there  is  no  net  influx  of  current  into 
the  network. 

Grounding  one  of  the  nodes  is  equivalent  to  nullifying  the  value  of  its  voltage  potential: 
ui  —  0.  This  variable  is  now  fixed,  and  can  be  safely  eliminated  from  our  system.  To 
accomplish  this,  we  let  A *  denote  the  m  x  (n  —  1)  matrix  obtained  by  deleting  the  ith 
column  from  A.  For  example,  grounding  node  4  in  our  sample  network,  so  u4  =  0,  allows 
us  to  erase  the  fourth  column  of  the  incidence  matrix  (6.19),  leading  to  the  reduced  incidence 
matrix 


(6.34) 


The  key  observation  is  that  A*  has  trivial  kernel,  ker  A*  =  {0},  and  therefore  the  reduced 
network  resistivity  matrix 


(C1  f  C2  f  C3  —  C1  —  C2  \ 

-  Cy  Cy  +  C4  0 

-c2  0  c2  +  c5/ 


(6.35) 


is  positive  definite.  Note  that  we  can  obtain  K*  directly  from  K  in  (6.30)  by  deleting  both 
its  fourth  row  and  fourth  column.  Let  f*  =  (1,0,0)  denote  the  reduced  current  source 
vector  obtained  by  deleting  the  fourth  entry  from  f .  Then  the  reduced  linear  system  is 


K*u*  =  f*,  (6.36) 

where  u*  =  ( u2,  u3  )  is  the  reduced  voltage  potential  vector.  Positive  definiteness 
of  K *  implies  that  (6.36)  has  a  unique  solution  u*,  from  which  we  can  reconstruct  the 
voltages  v  =  A*u*  and  currents  y  =  C  v  =  Cb4*u*  along  the  wires.  In  our  example,  if  all 
the  wires  have  unit  resistance,  then  the  reduced  system  (6.36)  is 
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T 

and  has  unique  solution  u*  =  ( | | | )  .  The  voltage  potentials  are 

2 5  ^2  4  7  ^3  4’  ^4 

and  correspond  to  the  earlier  solution  (6.32)  when  t  —  0.  The  corresponding  voltages  and 
currents  along  the  wires  are  the  same  as  before. 

Remark.  When  C  —  I ,  the  matrix  K  =  AT A  constructed  from  the  incidence  matrix  of  a 
directed  graph  is  known  in  the  mathematical  literature  as  the  graph  Laplacian  associated 
with  the  graph.  The  graph  Laplacian  matrix  can  be  easily  constructed  directly:  its  rows 
and  columns  are  indexed  by  the  vertices.  The  diagonal  entry  ku  equals  the  degree  of  the 
ith  vertex,  meaning  the  number  of  edges  that  have  vertex  i  as  one  of  their  endpoints.  The 
off-diagonal  entries  k%J  are  equal  to  —1  if  there  is  an  edge  connecting  vertices  i  and  j  and 
0  otherwise.  This  is  often  written  as  K  —  D  —  J  ^  where  D  is  the  diagonal  degree  matrix , 
whose  diagonal  entries  are  the  degrees  of  the  nodes,  and  J  is  the  symmetric  adjacency 
matrix ,  which  contains  a  1  in  every  off-diagonal  entry  corresponding  to  two  adjacent  nodes, 
that  is  two  nodes  connected  by  a  single  edge;  all  other  entries  are  0.  Observe  that  the 
graph  Laplacian  is  independent  of  the  direction  assigned  to  the  edges;  it  depends  only 
on  the  underlying  graph.  The  term  “Laplacian”  is  used  because  this  matrix  represents 
the  discrete  analogue  of  the  Laplacian  differential  operator,  described  in  Examples  7.36 
and  7.52  below.  In  particular,  if  the  graph  comes  from  an  n-dimensional  square  grid, 
the  corresponding  graph  Laplacian  coincides  with  the  standard  finite  difference  numerical 
discretization  of  the  Laplacian  differential  operator. 


Batteries,  Power,  and  the  Electrical— Mechanical  Correspondence 

So  far,  we  have  considered  only  the  effect  of  current  sources  at  the  nodes.  Suppose  now 
that  the  network  contains  one  or  more  batteries.  Each  battery  serves  as  a  voltage  source 
along  a  wire,  and  we  let  bk  denote  the  voltage  of  a  battery  connected  to  wire  k.  The  sign 
of  bk  indicates  the  relative  orientation  of  the  battery’s  terminals  with  respect  to  the  wire, 
with  bk  >  0  if  the  current  produced  by  the  battery  runs  in  the  same  direction  as  our  chosen 
orientation  of  the  wire.  The  battery’s  voltage  is  included  in  the  voltage  balance  equation 
(6.17): 

vk  =  ui~  uj  +  h ■ 

The  corresponding  vector  equation  (6.18)  becomes 

v  =  Au  T  b,  (6.37) 

where  b  =  ( 61?  62, . . . ,  6m  )  is  the  battery  vector ,  whose  entries  are  indexed  by  the  wires. 
(If  there  is  no  battery  on  wire  k ,  the  corresponding  entry  is  bk  =  0.)  The  remaining  two 
equations  are  as  before,  so  y  =  Cv  are  the  currents  in  the  wires,  and,  in  the  absence  of 
external  current  sources,  Kirchhoff’s  Current  Law  implies  ATy  =  0.  Using  the  modified 
formula  (6.37)  for  the  voltages,  these  combine  into  the  following  equilibrium  system: 

Ku  =  AT  C  Au  =  —  AT  Ch.  (6.38) 


Remark.  Interestingly,  the  voltage  potentials  satisfy  the  weighted  normal  equations  (5.36) 
that  characterize  the  least  squares  solution  to  the  system  Au  =  —  b  for  the  weighted  norm 

■TCv  (6.39) 


=  V 


determined  by  the  network’s  conductance  matrix  C.  It  is  a  striking  fact  that  Nature  solves 
a  least  squares  problem  in  order  to  make  the  weighted  norm  of  the  voltages  v  as  small  as 
possible.  A  similar  remark  holds  for  the  mass-spring  chains  considered  above. 
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Batteries  have  exactly  the  same  effect  on  the  voltage  potentials  as  if  we  imposed  the 
current  source  vector 

f  =  —ATCh.  (6.40) 

Namely,  placing  a  battery  of  voltage  bk  on  wire  k  is  exactly  the  same  as  introducing 
additional  current  sources  of  —  ck  bk  at  the  starting  node  and  ck  bk  at  the  ending  node. 
Note  that  the  induced  current  vector  f  E  coimgA  =  img  K  (see  Exercise  3.4.32)  continues 
to  satisfy  the  network  constraint  (6.33).  Conversely,  a  system  of  allowed  current  sources 
f  E  img  K  has  the  same  effect  as  any  collection  of  batteries  b  that  satisfies  (6.40). 

In  the  absence  of  external  current  sources,  a  network  with  batteries  always  admits 
a  solution  for  the  voltage  potentials  and  currents.  Although  the  currents  are  uniquely 
determined,  the  voltage  potentials  are  not.  As  before,  to  eliminate  the  ambiguity,  we  can 
ground  one  of  the  nodes  and  use  the  reduced  incidence  matrix  A *  and  reduced  current 
source  vector  f*  obtained  by  eliminating  the  column,  respectively  entry,  corresponding  to 
the  grounded  node.  The  details  are  left  to  the  interested  reader. 

Example  6.4.  Consider  an  electrical  network  running  along  the  sides  of  a  cube,  where 

each  wire  contains  a  2  ohm  resistor  and  there  is  a  9  volt  battery  source  on  one  wire.  The 
problem  is  to  determine  how  much  current  flows  through  the  wire  directly  opposite  the 
battery.  Orienting  the  wires  and  numbering  them  as  indicated  in  Figure  6.6,  the  incidence 
matrix  is 
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We  connect  the  battery  along  wire  1  and  measure  the  resulting  current  along  wire  12.  To 
avoid  the  ambiguity  in  the  voltage  potentials,  we  ground  the  last  node  and  erase  the  final 
column  from  A  to  obtain  the  reduced  incidence  matrix  A*.  Since  the  resistance  matrix  R 
has  all  2’s  along  the  diagonal,  the  conductance  matrix  is  C  —  \  I .  Therefore,  the  network 
resistivity  matrix  is  one-half  the  cubical  graph  Laplacian: 


K*  =  ( A*fCA *  =  \  (A*)tA*  =  I 
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Alternatively,  it  can  be  found  by  eliminating  the  final  row  and  column,  representing  the 
grounded  node,  from  the  graph  Laplacian  matrix  constructed  by  the  above  recipe.  The 
reduced  current  source  vector 

b  =  (9,0,0,0,0,0,0,0,0,0,0,0)t 
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Figure  6.6.  Cubical  Electrical  Network. 


corresponding  to  the  battery  situated  on  the  first  wire  is 

f*  =  -(A*)TCb=  (-§,  |,0,0, 0,0, 0)T. 

Solving  the  resulting  linear  system  K* u*  =  f*  by  Gaussian  Elimination  yields  the  voltage 
potentials 

*_/_o  9  _ 9  _ 9  3  3  _3  \T 

U  V  °’  4’  8’  8  ’  8  ’  8  ’  4  / 

Thus,  the  induced  currents  along  the  sides  of  the  cube  are 

v_pv_lf4*n*  I  U_/15  _ 15  _ 15  15  15  _  3  _  _  3  _  J3_  J3_  _3_  _  3  \T 

y  °  v  2  v^1  u  '  u;  v  8’  16’  16’  16’  16’  4’  16’  4’  16 ’  16 ’  16 ’  8  )  ' 

In  particular,  the  current  on  the  wire  that  is  opposite  the  battery  is  y12  —  —  flowing 
in  the  opposite  direction  to  its  orientation.  The  largest  current  flows  through  the  battery 
wire,  while  wires  7,  9, 10, 11  transmit  the  least. 


As  with  a  mass-spring  chain,  the  voltage  potentials  in  such  a  resistive  electrical  network 
can  be  characterized  by  a  minimization  principle.  The  power  in  a  single  conducting  wire 
is  defined  as  the  product  of  its  current  y-  and  voltage 

P3  =  Vj  V3  =  Rj  Vi  =  C3  T  (6'41) 

where  R-  is  the  resistance,  c-  —  1/ R-  the  conductance,  and  we  are  using  Ohm’s  Law 

(6.21)  to  relate  voltage  and  current.  Physically,  the  power  quantifies  the  rate  at  which 
electrical  energy  is  converted  into  heat  by  the  wire’s  resistance.  Summing  over  all  wires  in 
the  system,  the  internal  power!  Gf  network 


Pint  C3  V3 

3  3 

is  identihed  as  the  square  of  the  weighted  norm  (6.39). 


v 


2 


So  far,  we  have  not  considered  the  effect  of  batteries  or  current  sources  on  the  network. 


320 


6  Equilibrium 


The  Electrical-Mechanical  Correspondence 


Structures 

Variables 

Networks 

Displacements 

u 

Voltage  potentials 

Prestressed  bars/springs 

b 

Batteries 

Elongations* * 

v  =  Au  +  b 

Voltages 

Spring  stiffnesses 

C 

Conductivities 

Internal  Forces 

y  =  Cv 

Currents 

External  forcing 

f  =  ATy 

Current  sources 

Stiffness  matrix 

K  =  ATCA 

Resistivity  matrix 

Potential  energy 

p(  u)  =  |uTifu  —  uTf 

h  x  Power 

Consider  a  network  that  contains  batteries,  but  no  external  current  sources.  Summing 
over  all  the  wires  in  the  network,  the  total  power  due  to  internal  and  external  sources  can 
be  identified  as  the  product  of  the  current  and  voltage  vectors: 

P  =  y  iU+  •••  +yrnvrn  =  yTv  =  vTCv  =  (iu  +  b)TC(iu  +  b) 

=  uTATCAu  +  2uTATCh  +  hTCh, 

and  is  thus  a  quadratic  function  of  the  voltage  potentials,  which  we  rewrite  in  our  usual 
form* 

\  P  —  p( u)  =  7,  uT Ku  —  uTf  +  c,  (6.42) 

where  K  —  ATC  A  is  the  network  resistivity  matrix,  while  f  =  —ATC  b  are  the  equiva¬ 
lent  current  sources  at  the  nodes  (6.40)  that  correspond  to  the  batteries.  The  last  term 
c  =  \  bTCb  is  one-half  the  internal  power  of  the  batteries,  and  is  not  affected  by  the 
currents/voltages  in  the  wires.  In  deriving  (6.42),  we  have  ignored  external  current  sources 
at  the  nodes.  By  the  preceding  discussion,  external  current  sources  can  be  viewed  as  an 
equivalent  collection  of  batteries,  and  so  contribute  to  the  linear  terms  uTf  in  the  power, 
which  will  then  represent  the  combined  effect  of  all  batteries  and  external  current  sources. 

In  general,  the  resistivity  matrix  K  is  only  positive  semi-definite,  and  so  the  quadratic 
power  function  (6.42)  does  not,  in  general,  possess  a  minimizer.  As  argued  above,  to  ensure 
equilibrium,  we  need  to  ground  one  or  more  of  the  nodes.  The  resulting  reduced  power 
function 

p*( u*)  =  \  (u*)T/f*u*  -  (u*)Tf*,  (6.43) 

has  a  positive  definite  coefficient  matrix:  K *  >0.  Its  unique  minimizer  is  the  voltage 
potential  u*  that  solves  the  reduced  linear  system  (6.36).  We  conclude  that  the  electrical 
network  adjusts  itself  so  as  to  minimize  the  power  or  total  energy  loss  throughout  the 
network.  As  in  mechanics,  Nature  solves  a  minimization  problem  in  an  effort  to  conserve 
energy. 


*  Here,  we  use  v  instead  of  e  to  represent  elongation. 

*  For  alternating  currents,  the  power  is  reduced  by  a  factor  of  Jp  so  p( u)  equals  the  power. 
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We  have  now  discovered  the  remarkable  correspondence  between  the  equilibrium  equa¬ 
tions  for  electrical  networks  (6.10)  and  those  of  mass-spring  chains  (6.28).  This  Electrical- 
Mechanical  Correspondence  is  summarized  in  the  above  table.  In  the  following  section,  we 
will  see  that  the  analogy  extends  to  more  general  mechanical  structures. 


Exercises 


6.2.1.  Draw  the  electrical  networks  corresponding  to  the  following  incidence  matrices. 
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6.2.2.  Suppose  that  all  wires  in  the  illustrated  network  have  unit  resistivity, 
(a)  Write  down  the  incidence  matrix  A.  ( b )  Write  down  the  equilibrium 
system  for  the  network  when  node  4  is  grounded  and  there  is  a  current 
source  of  magnitude  3  at  node  1.  (c)  Solve  the  system  for  the  voltage 

potentials  at  the  ungrounded  nodes,  (d)  If  you  connect  a  light  bulb 
to  the  network,  which  wire  should  you  connect  it  to  so  that  it  shines 
the  brightest? 


6.2.3.  What  happens  in  the  network  in  Figure  6.5  if  we  ground  both  nodes  3  and  4?  Set  up 
and  solve  the  system  and  compare  the  currents  for  the  two  cases. 


6.2.4.  (a)  Write  down  the  incidence  matrix  A  for  the  illustrated 
electrical  network,  (b)  Suppose  all  the  wires  contain  unit 
resistors,  except  for  R4  =  2.  Let  there  be  a  unit  current  source 
at  node  1,  and  assume  node  5  is  grounded.  Find  the  voltage 
potentials  at  the  nodes  and  the  currents  through  the  wires. 

(c)  Which  wire  would  shock  you  the  most? 

6.2.5.  Answer  Exercise  6.2.4  if,  instead  of  the  current  source,  you 
put  a  1.5  volt  battery  on  wire  1. 


* 


* 


6.2.6.  Consider  an  electrical  network  running  along  the  sides  of  a  tetrahedron. 
Suppose  that  each  wire  contains  a  3  ohm  resistor  and  there  is  a  10  volt 
battery  source  on  one  wire.  Determine  how  much  current  flows  through 
the  wire  directly  opposite  the  battery. 

6.2.7.  Now  suppose  that  each  wire  in  the  tetrahedral  network  in  Exercise 
6.2.6  contains  a  1  ohm  resistor  and  there  are  two  5  volt  battery  sources 
located  on  two  non- adjacent  wires.  Determine  how  much  current  flows 
through  the  wires  in  the  network. 


4b  6.2.8.  (a)  How  do  the  currents  change  if  the  resistances  in  the  wires  in  the  cubical  network  in 
Example  6.4  are  all  equal  to  1  ohm?  (b)  What  if  wire  k  has  resistance  Rk  =  k  ohms? 

X  6.2.9.  Suppose  you  are  given  six  resistors  with  respective  resistances  1,2,  3, 4,  5,  and  6.  How 
should  you  connect  them  in  a  tetrahedral  network  (one  resistor  per  wire)  so  that  a  light 
bulb  on  the  wire  opposite  the  battery  burns  the  brightest? 
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X  6.2.10.  The  nodes  in  an  electrical  network  lie  on  the  vertices  (  ^,  ^  )  for  —  n  <  i,j  <  n  in  a 
square  grid  centered  at  the  origin;  the  wires  run  along  the  grid  lines.  The  boundary  nodes, 
when  x  or  y  =  ±1,  are  all  grounded.  A  unit  current  source  is  introduced  at  the  origin. 

(a)  Compute  the  potentials  at  the  nodes  and  currents  along  the  wires  for  n  =  2,3,  4. 

(b)  Investigate  and  compare  the  solutions  for  large  n,  i.e.,  as  the  grid  size  becomes  small. 

Do  you  detect  any  form  of  limiting  behavior? 

6.2.11.  Show  that,  in  a  network  with  all  unit  resistors,  the  currents  y  can  be  characterized  as 
the  unique  solution  to  the  Kirchhoff  equations  AT y  =  f  of  minimum  Euclidean  norm. 

6.2.12.  True  or  false:  (a)  The  nodal  voltage  potentials  in  a  network  with  batteries  b  are  the 

rji 

same  as  in  the  same  network  with  the  current  sources  f  =  —  A  Ch.  (b)  Are  the  currents 
the  same? 

6.2.13.  (a)  Assuming  all  wires  have  unit  resistance,  find  the  voltage  potentials  at  all  the 
nodes  and  the  currents  along  the  wires  of  the  following  trees  when  the  bottom  node  is 
grounded  and  a  unit  current  source  is  introduced  at  the  top  node. 


(b)  Can  you  make  any  general  predictions  about  electrical  currents  in  trees? 

6.2.14.  A  node  in  a  tree  is  called  terminating  if  it  has  only  one  edge.  Repeat  the  preceding 
exercise  when  all  terminating  nodes  except  for  the  top  one  are  grounded. 

6.2.15.  Suppose  the  graph  of  an  electrical  network  is  a  tree,  as  in  Exercise  2.6.9.  Show  that  if 
one  of  the  nodes  in  the  tree  is  grounded,  the  system  is  statically  determinate. 

6.2.16.  Suppose  two  wires  in  a  network  join  the  same  pair  of  nodes.  Explain  why  their  effect 
on  the  rest  of  the  network  is  the  same  as  a  single  wire  whose  conductance  c  =  c1  +  c2  is  the 
sum  of  the  individual  conductances.  How  are  the  resistances  related? 

6.2.17.  (a)  Write  down  the  equilibrium  equations  for  a  network  that  contains  both  batteries 
and  current  sources,  (b)  Formulate  a  general  superposition  principle  for  such  situations, 

(c)  Write  down  a  formula  for  the  power  in  the  network. 

0  6.2.18.  Prove  that  the  voltage  potential  at  node  i  due  to  a  unit  current  source  at  node  j  is  the 
same  as  the  voltage  potential  at  node  j  due  to  a  unit  current  source  at  node  i.  Can  you 
give  a  physical  explanation  of  this  reciprocity  relation ? 

6.2.19.  What  is  the  analogue  of  condition  (6.33)  for  a  disconnected  graph? 


6.3  Structures 

A  structure  (sometimes  called  a  truss )  is  a  mathematical  idealization  of  a  framework  for 
a  building.  Think  of  a  radio  tower  or  a  skyscraper  when  just  the  I-beams  are  connected 
before  the  walls,  floors,  ceilings,  roof,  and  ornamentation  are  added.  An  ideal  structure 
is  constructed  of  elastic  bars  connected  at  joints.  By  a  6ar,  we  mean  a  straight,  rigid  rod 
that  can  be  (slightly)  elongated,  but  not  bent.  (Beams,  which  are  allowed  to  bend,  are 
more  complicated  and  are  modeled  by  boundary  value  problems  for  ordinary  and  partial 
differential  equations,  [61,79],  See  also  our  discussion  of  splines  in  Section  5.5.)  When 
a  bar  is  stretched,  it  obeys  Hooke’s  law  —  at  least  in  the  linear  regime  we  are  modeling 
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Figure  6.7.  Displacement  of  a  Bar. 


and  so,  for  all  practical  purposes,  behaves  like  a  spring  with  a  very  large  stiffness.  As 
a  result,  a  structure  can  be  regarded  as  a  two-  or  three-dimensional  generalization  of  a 
mass-spring  chain. 

The  joints  will  allow  the  bar  to  rotate  in  any  direction.  Of  course,  this  is  an  idealization; 
in  a  building,  the  rivets  and  bolts  will  (presumably)  prevent  rotation  to  a  significant  degree. 
However,  under  moderate  stress  —  for  example,  if  the  wind  is  blowing  on  our  skyscraper, 
the  bolts  can  be  expected  only  to  keep  the  structure  connected,  and  the  resulting  motions 
will  induce  stresses  on  the  joints  that  must  be  taken  into  account  when  designing  the 
structure.  Of  course,  under  extreme  stress,  the  structure  will  fall  apart  —  a  disaster  that 
its  designers  must  avoid.  The  purpose  of  this  section  is  to  derive  conditions  that  will 
guarantee  that  a  structure  is  rigidly  stable  under  moderate  forcing,  or,  alternatively,  help 
us  to  understand  the  processes  that  might  lead  to  its  collapse. 

The  first  order  of  business  is  to  understand  how  an  individual  bar  reacts  to  motion.  We 
have  already  encountered  the  basic  idea  in  our  treatment  of  springs.  The  key  complication 
here  is  that  the  ends  of  the  bar  are  not  restricted  to  a  single  direction  of  motion,  but  can 
move  in  either  two-  or  three-dimensional  space.  We  use  d  to  denote  the  dimension  of  the 
underlying  space.  In  the  d  =  1-dimensional  case,  the  structure  reduces  to  a  mass-spring 
chain  that  we  analyzed  in  Section  6.1.  Here  we  concentrate  on  structures  in  d  —  2  and  3 
dimensions. 


Consider  an  unstressed  bar  with  one  end  at  position  a:  E  and  the  other  end  at 
position  a2  E  In  d  —  2  dimensions,  we  write  a  i  —  (yai^bi)  ,  while  in  d  —  3 -dimensional 


,  where  we  use  the  standard 


T 

space,  a^  =  (  ai:  bi:  ci )  .  The  length  of  the  bar  is  L  —  ||  a:  —  a: 

Euclidean  norm  to  measure  distance  on  Wd  throughout  this  section. 

Suppose  we  move  the  ends  of  the  bar  a  little,  sending  a^  to  =  a^  +  e  and,  simul¬ 
taneously,  a  •  to  b  ■  =  a  •  +  £u  -,  moving  the  blue  bar  in  Figure  6.7  to  the  displaced  orange 

bar.  The  vectors  ui?u  -  E  indicate  the  respective  directions  of  displacement  of  the  two 
ends,  and  we  use  e  to  represent  the  relative  magnitude  of  the  displacement.  How  much  has 
this  motion  stretched  the  bar?  Since  we  are  assuming  that  the  bar  can’t  bend,  the  length 
of  the  displaced  bar  is 


L  +  e 


(a;  +fUj)-  (aj  +  ey 

— 

(ai  -  a,)  +  £  K  -  uj) 

&-j 

2  +  2e(ai-aj)-(ui-uj)+£2 

£ 

<s> . 

£ 

Vo. 

to 

(6.44) 
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The  difference  between  the  new  length  and  the  original  length,  namely 


e 


2  +  2e(ai 


a*- 


a 


3 


(6.45) 


is,  by  definition,  the  bar’s  elongation. 

If  the  underlying  dimension  d  is  2  or  more,  the  elongation  (6.45)  is  a  nonlinear  function 
of  the  displacement  vectors  ui?u-.  Thus,  an  exact,  geometrical  treatment  of  structures 
in  equilibrium  requires  dealing  with  complicated  nonlinear  systems  of  equations.  In  some 
situations,  e.g.,  the  design  of  robotic  mechanisms,  [57,  75],  analysis  of  the  nonlinear  system 
is  crucial,  but  this  lies  beyond  the  scope  of  this  text.  However,  in  many  practical  situations, 
the  displacements  are  fairly  small,  so  \  e\  <C  1.  For  example,  when  a  building  moves,  the 
lengths  of  bars  are  in  meters,  but  the  displacements  are,  barring  catastrophes,  typically  in 
centimeters  if  not  millimeters.  In  such  situations,  we  can  replace  the  geometrically  exact 
elongation  by  a  much  simpler  linear  approximation. 

As  you  learned  in  calculus,  the  most  basic  linear  approximation  to  a  nonlinear  function 
g{e)  near  e  =  0  is  given  by  its  tangent  line  or  linear  Taylor  polynomial 


g(s)  «  gr(o) +5,(0)e, 


£ 


<  1, 


(6.46) 


as  sketched  in  Figure  6.8.  In  the  case  of  small  displacements  of  a  bar,  the  elongation  (6.45) 
is  a  square  root  function  of  the  particular  form 


g(e)  =  \/  a2  +  2  e  b  +  e2  c2  — 


a. 


where 


a  = 


a  •  —  a 


3 


b  =  K  -  a,-)  •  K  -  u,-), 


3 


C  — 


u  •  —  u 


3 


are  independent  of  e.  Since  g(0)  =  0  and  g'( 0)  =  6/a,  the  linear  approximation  (6.46)  is 


\/a2  +  2  s  b  +  e2  c2  —  a 


b 

£  - 
a 


for 


<C  1 


In  this  manner,  we  arrive  at  the  linear  approximation  to  the  bar’s  elongation 


(ai  -  a4  •  (ui  -  u4 


ai  -  a, 


=  n  •  (e  Uj  —  euj), 


where 


n  = 


—  aj 


-  a. 


is  the  unit  vector,  ||  n||  =  1,  that  points  in  the  direction  of  the  bar  from  node  j  to  node  i. 

The  factor  e  was  merely  a  mathematical  device  used  to  derive  the  linear  approximation. 
It  can  now  be  safely  discarded,  so  that  the  displacement  of  the  ith  node  is  now  instead  of 
e  u^,  and  we  assume  ||  uj|  is  small.  If  bar  k  connects  node  i  to  node  j,  then  its  (approximate) 
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Figure  6.9.  Unit  Vectors  for  a  Bar. 


elongation  is  equal  to 


ek  =  nk  ■  K  -  uj)  =  n 


lk 


U 


where 


nk  = 


az  —  aj 


a- 


a7 


(6.47) 


The  elongation  ek  is  the  sum  of  two  terms:  the  first,  nk  •  u^,  is  the  component  of  the 
displacement  vector  for  node  i  in  the  direction  of  the  unit  vector  nk  that  points  along  the 
bar  towards  node  i,  whereas  the  second,  —  nk  •  u  -  ,  is  the  component  of  the  displacement 
vector  for  node  j  in  the  direction  of  the  unit  vector  —  nfc  that  points  in  the  opposite  direction 
along  the  bar  toward  node  j ;  see  Figure  6.9.  Their  sum  equals  the  total  elongation  of  the 
bar. 

We  assemble  all  the  linear  equations  (6.47)  relating  nodal  displacements  to  bar  elonga¬ 
tions  in  matrix  form 

e  =  Au.  (6.48) 

/ e  i  \  ui  \ 


Here  e 


C  is  the  vector  of  elongations,  while  u  = 


\  Vn  / 


U, 


\  / 


G  is  the  vector 


of  displacements.  Each  u  E  Wd  is  itself  a  column  vector  with  d  entries,  and  so  u  has  a 


ox- 


total  of  dn  entries.  For  example,  in  the  planar  case  d  =  2,  we  have  =  I  ) ,  since  each 
node’s  displacement  has  both  an  x  and  y  component,  and  so 


ui  \ 


u  = 


u. 


\  / 


(  xi  \ 

Vi 

x2 

V2 
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e  R 


2  n 


\Vn  / 


In  three  dimensions,  d  —  3,  we  have  =  {xi^yi^zi)  ,  and  so  each  node  will  contribute 
three  components  to  the  displacement  vector 


U  =  (x1,y1,z1,x2,y2,z2,  ...  ,xn,yn,zn)T  €  M 

The  incidence  matrix  A  connecting  the  displacements  and  elongations  will  be  of  size 
m  x  (dn).  The  kth  row  of  A  will  have  (at  most)  2d  nonzero  entries.  The  entries  in  the  d 
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slots  corresponding  to  node  i  will  be  the  components  of  the  (transposed)  unit  bar  vector 
pointing  towards  node  i,  as  given  in  (6.47),  while  the  entries  in  the  d  slots  corresponding  to 
node  j  will  be  the  components  of  its  negative  —  ,  which  is  the  unit  bar  vector  pointing 

towards  node  j.  All  other  entries  are  0.  The  general  mathematical  formulation  is  best 
appreciated  by  working  through  an  explicit  example. 

Example  6.5.  Consider  the  planar  structure  pictured  in  Figure  6.10.  The  four  nodes 
are  at  positions 

ax  =  (0, 0)T,  a2  =  (1, 1)T,  a3  =  (3, 1)T,  a4  =  (4, 0)T, 


so  the  two  side  bars  are  at  45°  angles  and  the  center  bar  is  horizontal.  Implementing  our 
construction,  the  associated  incidence  matrix  is 


f  - 
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72 
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72 
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A  = 
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72 
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72 

/ 

(6.49) 


The  three  rows  of  A  refer  to  the  three  bars  in  our  structure.  The  columns  come  in  pairs, 
as  indicated  by  the  vertical  lines  in  the  matrix:  the  first  two  columns  refer  to  the  x  and 
y  displacements  of  the  first  node;  the  third  and  fourth  columns  refer  to  the  second  node; 
and  so  on.  The  first  two  entries  of  the  first  row  of  A  indicate  the  unit  vector 


al  a2 


1 

V2 


h) 


that  points  along  the  first  bar  towards  the  first  node,  while  the  third  and  fourth  entries 
have  the  opposite  signs,  and  form  the  unit  vector 


—  n 


l 


a2  al 

(  1 

i  A 

a2  al 

V  72’ 

72  J 

T 


along  the  same  bar  that  points  in  the  opposite  direction  —  towards  the  second  node.  The 
remaining  entries  are  zero  because  the  first  bar  connects  only  the  first  two  nodes.  Similarly, 
the  unit  vector  along  the  second  bar  pointing  towards  node  2  is 


a2  a3 


(-i,o) 


and  this  gives  the  third  and  fourth  entries  of  the  second  row  of  A ;  the  fifth  and  sixth  entries 
are  their  negatives,  corresponding  to  the  unit  vector  —  n2  pointing  towards  node  3.  The 
last  row  is  constructed  from  the  unit  vectors  along  bar  #3  in  the  same  fashion. 


Remark.  Interestingly,  the  incidence  matrix  for  a  structure  depends  only  on  the  directions 
of  the  bars  and  not  their  lengths.  This  is  analogous  to  the  fact  that  the  incidence  matrix 
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for  an  electrical  network  depends  only  on  the  connectivity  properties  of  the  wires  and  not 
on  their  overall  lengths.  One  can  regard  the  incidence  matrix  for  a  structure  as  a  kind  of 
d-dimensional  generalization  of  the  incidence  matrix  for  a  directed  graph. 

The  next  phase  of  our  procedure  is  to  introduce  the  constitutive  relations  for  the  bars 
that  determine  their  internal  forces  or  stresses.  As  we  remarked  at  the  beginning  of  the 
section,  each  bar  is  viewed  as  a  hard  spring,  subject  to  a  linear  Hooke’s  law  equation 


Vk  ~  ck  ek 


(6.50) 


that  relates  its  elongation  ek  to  its  internal  force  yk.  The  bar  stiffness  ck  >  0  is  a  positive 
scalar,  and  so  yk  >  0  if  the  bar  is  in  tension,  while  yk  <  0  if  the  bar  is  compressed.  We 
write  (6.50)  in  matrix  form 

y  —  C  e,  (6.51) 

where  C  =  diag  (c1; . . . ,  cm)  >  0  is  a  diagonal,  positive  definite  matrix. 

Finally,  we  need  to  balance  the  forces  at  each  node  in  order  to  achieve  equilibrium. 
If  bar  k  terminates  at  node  i,  then  it  exerts  a  force  —  yk  nfc  on  the  node,  where  nfc  is 
the  unit  vector  pointing  towards  the  node  in  the  direction  of  the  bar,  as  in  (6.47).  The 
minus  sign  comes  from  physics:  if  the  bar  is  under  tension,  so  yk  >  0,  then  it  is  trying  to 
contract  back  to  its  unstressed  state,  and  so  will  pull  the  node  towards  it  —  in  the  opposite 
direction  to  nfc  —  while  a  bar  in  compression  will  push  the  node  away.  In  addition,  we 
may  have  an  externally  applied  force  vector,  denoted  by  fq,  on  node  i,  which  might  be 
some  combination  of  gravity,  weights,  mechanical  forces,  and  so  on.  (In  this  admittedly 
simplified  model,  external  forces  act  only  on  the  nodes  and  not  directly  on  the  bars.)  Force 
balance  at  equilibrium  requires  that  all  the  nodal  forces,  external  and  internal,  cancel;  thus, 


e  +  ( -yknk )  =  o, 

k 


or 


^  v  yk  ^k  ^  i  i 

k 


where  the  sum  is  over  all  the  bars  that  are  attached  to  node  i.  The  matrix  form  of  the 
force  balance  equations  is  (and  this  should  no  longer  come  as  a  surprise) 


where  AT  is  the  transpose  of  the  incidence  matrix,  and  f  = 


(6.52) 


E  Mdn  is  the  vector 


Vw 

containing  all  external  forces  on  the  nodes.  Putting  everything  together,  (6.48,  51,  52), 


e  =  A  u. 


y  =  Ce, 


f  =  ATy, 


we  are  once  again  led  to  a  by  now  familiar  linear  system  of  equations: 

An  =  f.  where  K  —  ATCA 


(6.53) 


is  the  stiffness  matrix  for  our  structure. 

The  stiffness  matrix  K  is  a  positive  (semi-) definite  Gram  matrix  (3.64)  associated  with 
the  weighted  inner  product  on  the  space  of  elongations  prescribed  by  the  diagonal  matrix 
C .  As  we  know,  K  will  be  positive  definite  if  and  only  if  the  kernel  of  the  incidence  matrix  is 
trivial:  ker  A  =  {0}.  However,  the  preceding  example  does  not  enjoy  this  property,  because 
we  have  not  tied  down  (or  “grounded”)  our  structure.  In  essence,  we  are  considering  a 
structure  floating  in  outer  space,  which  is  free  to  move  around  in  any  direction.  Each  rigid 
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a. 


Figure  6.11.  A  Triangular  Structure. 


motion^  of  the  structure  will  correspond  to  an  element  of  the  kernel  of  its  incidence  matrix, 
and  thereby  preclude  positive  definiteness  of  its  stiffness  matrix. 

Example  6.6.  Consider  a  planar  space  station  in  the  shape  of  a  unit  equilateral  triangle, 
as  in  Figure  6.11.  Placing  the  nodes  at  positions 

T 


a 


l 


=  (i 


2  j  ,  a2  =  ( 1, 0  )T  ,  a3  =  (0,0) 

we  use  the  preceding  algorithm  to  construct  the  incidence  matrix 
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whose  rows  are  indexed  by  the  bars,  and  whose  columns  are  indexed  in  pairs  by  the  three 
nodes.  The  kernel  of  A  is  three-dimensional,  with  basis 


zi  = 


/A 
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Vo/ 


z2  = 
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(6.54) 


We  claim  that  these  three  displacement  vectors  represent  three  different  planar  rigid  mo¬ 
tions:  the  first  two  correspond  to  translations,  and  the  third  to  a  rotation. 

The  translations  are  easy  to  discern.  Translating  the  space  station  in  a  horizontal 
direction  means  that  we  move  all  three  nodes  the  same  amount,  and  so  the  displacements 
are  u1  —  u2  =  u3  =  a  for  some  vector  a.  In  particular,  a  rigid  unit  horizontal  translation 

has  a  =  e-L  =  (1,0)T,  and  corresponds  to  the  first  kernel  basis  vector.  Similarly,  a  unit 

vertical  translation  of  all  three  nodes  corresponds  to  a  =  e2  =  ( 0, 1 )  ,  and  corresponds 
to  the  second  kernel  basis  vector.  Every  other  translation  is  a  linear  combination  of  these 
two.  Translations  do  not  alter  the  lengths  of  any  of  the  bars,  and  so  do  not  induce  any 
stress  in  the  structure. 


See  Section  7.2  for  an  extended  discussion  of  rigid  motions. 
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Figure  6.12.  Rotating  a  Space  Station. 


The  rotations  are  a  little  more  subtle,  owing  to  the  linear  approximation  that  we  used  to 
compute  the  elongations.  Referring  to  Figure  6.12,  we  see  that  rotating  the  space  station 

through  a  small  angle  e  around  the  node  a3  =  (0,0)  will  move  the  other  two  nodes  to 
positions 


bi  = 


7j  COS  £  — 


/3 


sme 


1  /o 

7,  sin  s  +  ^  cos  £ 


b2  = 


cose 


sine 


bo  = 


However,  the  corresponding  displacements 


|  (cose  —  1)  —  ^  sine 
\  sine  +  ^  (cose  —  1) 


Ui  =  b: 


ai  = 


u2  =  b2 


U3  =  b3 


a0  = 


(6.55) 


(6.56) 


do  not  combine  into  a  vector  that  belongs  to  ker  A  The  problem  is  that,  under  a  rotation, 
the  nodes  move  along  circles,  while  the  kernel  displacements  u  =  ezG  ker  A  correspond 
to  straight  line  motion!  In  order  to  maintain  consistency,  we  must  adopt  a  similar  linear 
approximation  of  the  nonlinear  circular  motion  of  the  nodes.  Thus,  we  replace  the  nonlinear 
displacements  u  -(£r)  in  (6.56)  by  their  linear  tangent  approximations  ^  su'(0),  so 


The  resulting  displacements  do  combine  to  produce  the  displacement  vector 

11  =  £  (-^ft  0,  1,  0,  o)  =  £Z3 


that  moves  the  space  station  in  the  direction  of  the  third  kernel  basis  vector.  Thus,  as 
claimed,  z3  represents  the  linear  approximation  to  a  rigid  rotation  around  the  first  node. 

Remarkably,  the  rotations  around  the  other  two  nodes,  although  distinct  nonlinear 
motions,  can  be  linearly  approximated  by  particular  combinations  of  the  three  kernel  basis 


^  Note  that  u  -(0)  =  0. 
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elements  z1,z2,z3,  and  so  already  appear  in  our  description  of  ker  A  For  example,  the 
displacement  vector 


U  =  £  (  ^  Zx  +  7,  Z2  —  Z 


=  e(0  0  &  &  1 

C  I  U,  U,  9  J  05  999 


T 


(6.57) 


represents  the  linear  approximation  to  a  rigid  rotation  around  the  first  node.  We  conclude 
that  the  three-dimensional  kernel  of  the  incidence  matrix  represents  the  sum  total  of  all 
possible  rigid  motions  of  the  space  station,  or,  more  correctly,  their  linear  approximations. 

Which  types  of  forces  will  maintain  the  space  station  in  equilibrium?  This  will  happen 
if  and  only  if  we  can  solve  the  force  balance  equations  AT y  =  f  for  the  internal  forces 
y.  The  Fredholm  Alternative  Theorem  4.46  implies  that  this  system  has  a  solution  if  and 
only  if  f  is  orthogonal  to  coker  AT  =  ker  A.  Therefore,  f  =  (  f1,g1 ,  /2,g2,  /3A3  )T  must  be 
orthogonal  to  the  kernel  basis  vectors  (6.54),  and  so  must  satisfy  the  three  linear  constraints 

zi  •  f  =  A  +  h  +  h  =  °> 

z2  '  f  =  9i  +92  +93  =  0,  (6.58) 

z3  '  f  =  -  IT  /1  +  I  9i  +  92  = 

The  first  constraint  requires  that  there  be  no  net  horizontal  force  on  the  space  station. 
The  second  requires  no  net  vertical  force.  The  last  constraint  requires  that  the  moment 
of  the  forces  around  the  third  node  vanishes.  The  vanishing  of  the  force  moments  around 
each  of  the  other  two  nodes  is  a  consequence  of  these  three  conditions,  since  the  associated 
kernel  vectors  can  be  expressed  as  linear  combinations  of  the  three  basis  elements.  The 
corresponding  physical  requirements  are  clear.  If  there  is  a  net  horizontal  or  vertical  force, 
the  space  station  will  rigidly  translate  in  that  direction;  if  there  is  a  non-zero  force  moment, 
the  station  will  rigidly  rotate.  In  any  event,  unless  the  force  balance  constraints  (6.58)  are 
satisfied,  the  space  station  cannot  remain  in  equilibrium.  A  freely  floating  space  station  is 
an  unstable  structure  that  can  easily  be  set  into  motion  with  a  tiny  external  force. 

Since  there  are  three  independent  rigid  motions,  we  must  impose  three  constraints  on 
the  structure  in  order  to  fully  stabilize  it  under  general  external  forcing.  “Grounding”  one 
of  the  nodes,  i.e.,  preventing  it  from  moving  by  attaching  it  to  a  fixed  support,  will  serve 
to  eliminate  the  two  translational  instabilities.  For  example,  setting  u3  =  0  has  the  effect 
of  fixing  the  third  node  of  the  space  station  to  a  support.  With  this  specification,  we  can 
eliminate  the  variables  associated  with  that  node,  and  thereby  delete  the  corresponding 
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The  kernel  of  A *  is  one-dimensional,  spanned  by  the  single  vector  z3=^^,  0,  1^  , 

which  corresponds  to  (the  linear  approximation  of)  the  rotations  around  the  fixed  node.  To 
prevent  the  structure  from  rotating,  we  can  also  fix  the  second  node,  by  further  requiring 
u2  =  0.  This  serves  to  also  eliminate  the  third  and  fourth  columns  of  the  original  incidence 
matrix.  The  resulting  “doubly  reduced”  incidence  matrix 


6.3  Structures 


331 


2  ©  3 


has  trivial  kernel:  ker  A** 


{0}.  Therefore,  the  corresponding  reduced  stiffness  matrix 


_  (A**) T  _ 


0 

3 

2 


is  positive  definite.  A  planar  triangle  with  two  fixed  nodes  is  a  stable  structure,  which  can 
now  support  an  arbitrary  external  forcing  on  the  remaining  free  node.  (Forces  on  the  fixed 
nodes  have  no  effect,  since  they  are  no  longer  allowed  to  move.) 


In  general,  a  planar  structure  without  any  fixed  nodes  will  have  at  least  a  three- 
dimensional  kernel,  corresponding  to  the  rigid  motions  of  translations  and  (linear  approxi¬ 
mations  to)  rotations.  To  stabilize  the  structure,  one  must  fix  two  (non-coincident)  nodes. 
A  three-dimensional  structure  that  is  not  tied  to  any  fixed  supports  will  admit  6  inde¬ 
pendent  rigid  motions  in  its  kernel.  Three  of  these  correspond  to  rigid  translations  in  the 
three  coordinate  directions,  while  the  other  three  correspond  to  linear  approximations  to 
the  rigid  rotations  around  the  three  coordinate  axes.  To  eliminate  the  rigid  motion  insta¬ 
bilities  of  the  structure,  we  need  to  fix  three  non-collinear  nodes.  Indeed,  fixing  one  node 
will  eliminate  translations;  fixing  two  nodes  will  still  leave  the  rotations  around  the  axis 
through  the  fixed  nodes.  Details  can  be  found  in  the  exercises. 

Even  after  a  sufficient  number  of  nodes  have  been  attached  to  fixed  supports  so  as  to 
eliminate  all  possible  rigid  motions,  there  may  still  remain  nonzero  vectors  in  the  kernel 
of  the  reduced  incidence  matrix  of  the  structure.  These  indicate  additional  instabilities 
that  allow  the  shape  of  the  structure  to  deform  without  any  applied  force.  Such  non-rigid 
motions  are  known  as  mechanisms  of  the  structure.  Since  a  mechanism  moves  the  nodes 
without  elongating  any  of  the  bars,  it  does  not  induce  any  internal  forces.  A  structure  that 
admits  a  mechanism  is  unstable  —  even  tiny  external  forces  may  provoke  a  large  motion. 

Consider  the  three-bar  structure  of  Example  6.5,  but  now  with  its  two 

supports,  as  pictured  in  Figure  6.13.  Since  we  are  fixing  nodes  1  and  4, 
we  set  =  u4  =  0.  Hence,  we  should  remove  the  first  and  last  column  pairs  from  the 
incidence  matrix  (6.49),  leading  to  the  reduced  incidence  matrix 
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The  structure  no  longer  admits  any  rigid  motions.  However,  the  kernel  of  A *  is  one¬ 
dimensional,  spanned  by  reduced  displacement  vector  z*  =  (1,— 1,1,1)  ,  which  corre¬ 
sponds  to  the  unstable  mechanism  that  displaces  the  second  node  in  the  direction  u2  = 
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Figure  6.14.  Unstable  Mechanism  of  the  Three  Bar  Structure. 


( 1,  —  1  )T  and  the  third  node  in  the  direction  u3  =  (1,1).  Geometrically,  then,  z*  rep¬ 
resents  the  displacement  whereby  node  2  moves  down  and  to  the  right  at  a  45°  angle, 
while  node  3  moves  simultaneously  up  and  to  the  right  at  a  45°  angle;  the  result  of  the 
mechanism  is  sketched  in  Figure  6.14.  This  mechanism  does  not  alter  the  lengths  of  the 
three  bars  (at  least  in  our  linear  approximation  regime)  and  so  requires  no  net  force  to  be 
set  into  motion. 

As  with  the  rigid  motions  of  the  space  station,  an  external  forcing  vector  f  *  will  maintain 
equilibrium  only  when  it  lies  in  the  coimage  of  A*,  and  hence,  by  the  Fredholm  Alternative, 

T 

must  be  orthogonal  to  all  the  mechanisms  in  ker  A* .  Thus,  the  nodal  forces  f2  =  (  /2,  $2  ) 
and  f  3  =  ( /3,  g3  )T  must  satisfy  the  balance  law 

z*  *  f*  =  f2  ~  92  +  fs  +  9s  =  0- 


If  this  fails,  the  equilibrium  equation  has  no  solution,  and  the  structure  will  be  set  into 
motion.  For  example,  a  uniform  horizontal  force  /2  =  /3  =  1,  =  $3  =  0,  will  induce 

the  mechanism,  whereas  a  uniform  vertical  force,  f2  =  /3  =  0,  g2  =  g3  =  1,  will  maintain 
equilibrium.  In  the  latter  case,  the  equilibrium  equations 


K*  u*  =  f*,  where 


have  an  indeterminate  solution 


K*  =  ( A*)tA * 
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u*  =  (  —  3, 5,  —  2, 0  )T  +  i  ( 1,  —  1, 1, 1  )T  , 


since  we  can  add  in  any  element  of  ker  AT*  =  ker  A*.  In  other  words,  the  equilibrium 
position  is  not  unique,  since  the  structure  can  still  be  displaced  in  the  direction  of  the 
unstable  mechanism  while  maintaining  the  overall  force  balance.  On  the  other  hand,  the 
elongations  and  internal  forces 


y  =  e  =  A*  u*  =  (  y/2,  1,  V2)T, 


are  well  defined,  indicating  that,  under  our  stabilizing  uniform  vertical  (upwards)  force,  all 
three  bars  are  elongated,  with  the  two  diagonals  experiencing  41.4%  more  elongation  than 
the  horizontal  bar. 


Remark.  Just  like  the  rigid  rotations,  the  mechanisms  described  here  are  linear  approx¬ 
imations  to  the  actual  nonlinear  motions.  In  a  physical  structure,  the  vertices  will  move 
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Figure  6.15.  Nonlinear  Mechanism  of  the  Three  Bar  Structure. 


along  curves  whose  tangents  at  the  initial  configuration  are  the  directions  indicated  by  the 
mechanism  vector.  In  the  linear  approximation  illustrated  in  Figure  6.14,  the  lengths  of 
the  bars  will  change  slightly.  In  the  true  nonlinear  mechanism,  illustrated  in  Figure  6.15, 
the  nodes  must  move  along  circles  so  as  to  rigidly  preserve  the  lengths  of  all  three  bars.  In 
certain  cases,  a  structure  can  admit  a  linear  mechanism,  but  one  that  cannot  be  physically 
realized  due  to  the  nonlinear  constraints  imposed  by  the  geometrical  configurations  of  the 
bars.  Nevertheless,  such  a  structure  is  at  best  borderline  stable,  and  should  not  be  used  in 
any  real-world  constructions. 


We  can  always  stabilize  a  structure  by  first  fixing  nodes  to  eliminate  rigid  motions,  and 
then  adding  in  a  sufficient  number  of  extra  bars  to  prevent  mechanisms.  In  the  preceding 
example,  suppose  we  attach  an  additional  bar  connecting  nodes  2  and  4,  leading  to  the 
reinforced  structure  in  Figure  6.16.  The  revised  incidence  matrix  is 


A  = 


1 

72 

1 

72 

1 

72 

i 

72 

0 

0 

0 

0 

\ 

0 

0 

-  1 

0 

1 

0 

0 

0 

0 

0 

1 

1 

1 

1 

0 

0 

_75 

72 

72 

'  72 

0 

0 

3 

VTo 

1 

vTo 

0 

0 

3 

vTo 

1 

VTo 

/ 

and  is  obtained  from  (6.49)  by  appending  another  row  representing  the  added  bar.  When 
nodes  1  and  4  are  fixed,  the  reduced  incidence  matrix 


A*  = 
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has  trivial  kernel,  ker  A*  =  {0},  and  hence  the  reinforced  structure  is  stable.  It  admits  no 
mechanisms,  and  can  support  any  configuration  of  forces  (within  reason  —  mathematically 
the  structure  will  support  an  arbitrarily  large  external  force,  but  very  large  forces  will  take 
us  outside  the  linear  regime  described  by  the  model,  and  the  structure  may  be  crushed). 

This  particular  case  is  statically  determinate  owing  to  the  fact  that  the  incidence  matrix 
is  square  and  nonsingular,  which  implies  that  one  can  solve  the  force  balance  equations 
(6.52)  directly  for  the  internal  forces.  For  instance,  a  uniform  downwards  vertical  force 
f2  =  /3  =  0,  g2  =  g3  =  —  1,  e.g.,  gravity,  will  produce  the  internal  forces 


indicating  that  bars  1,  2  and  3  are  compressed,  while,  interestingly,  the  reinforcing  bar  4 
remains  unchanged  in  length  and  hence  experiences  no  internal  force.  Assuming  that  the 
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2  ©  3 


Figure  6.17.  Doubly  Reinforced  Planar  Structure. 


bars  are  all  of  the  same  material,  and  taking  the  elastic  constant  to  be  1,  so  C  =  I,  then 
the  reduced  stiffness  matrix  is 

(  T  \  - 1  °\ 

l  I  0  0 

-10  1-1 
0  0  -1  \) 

The  solution  to  the  reduced  equilibrium  equations  is 


K*  =  (A*)tA*  = 


(  _  1  _  3  _  3  _  7  \T 

V  2  ’  2  ’  2  ’  2  /  ’ 


SO 


u2  =  (~b 
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give  the  displacements  of  the  two  nodes  under  the  applied  force.  Both  are  moving  down 
and  to  the  left,  with  node  3  moving  relatively  farther  owing  to  its  lack  of  reinforcement. 

Suppose  we  reinforce  the  structure  yet  further  by  adding  in  a  bar  connecting  nodes  1 
and  3,  as  in  Figure  6.17.  The  resulting  reduced  incidence  matrix 
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again  has  trivial  kernel,  kerTl*  =  {0},  and  hence  the  structure  is  stable.  Indeed,  adding 
extra  bars  to  a  stable  structure  cannot  cause  it  to  lose  stability.  (In  the  language  of  lin¬ 
ear  algebra,  appending  additional  rows  to  a  matrix  cannot  increase  the  size  of  its  kernel, 
cf.  Exercise  2.5.10.)  Since  the  incidence  matrix  is  rectangular,  the  structure  is  now  statically 
indeterminate ,  and  we  cannot  determine  the  internal  forces  without  first  solving  the  full 
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equilibrium  equations  (6.53)  for  the  displacements.  The  stiffness  matrix  is 
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Under  the  same  uniform  vertical  force,  the  displacement  u*=  (^,  —  —  y^,  —  y^ ) 

indicates  that  the  free  nodes  now  move  symmetrically  down  and  towards  the  center  of  the 
structure.  The  internal  forces  on  the  bars  are 


All  five  bars  are  now  experiencing  compression,  with  the  two  outside  bars  being  the  most 
stressed.  This  relatively  simple  computation  should  already  indicate  to  the  practicing 
construction  engineer  which  of  the  bars  in  the  structure  are  more  likely  to  collapse  under 
an  applied  external  force. 

Summarizing  our  discussion,  we  have  established  the  following  fundamental  result  char¬ 
acterizing  the  stability  and  equilibrium  of  structures. 


Theorem  6.8.  A  structure  is  stable,  and  so  will  maintain  its  equilibrium  under  arbitrary 
external  forcing,  if  and  only  if  its  reduced  incidence  matrix  A *  has  linearly  independent 
columns,  or,  equivalently,  ker  A*  =  {0}.  More  generally,  an  external  force  f*  on  a  structure 
will  maintain  equilibrium  if  and  only  if  f*  E  coimgA*  =  (ker  A*)-1-,  which  requires  that 
the  external  force  be  orthogonal  to  all  rigid  motions  and  all  mechanisms  admitted  by  the 
structure. 


Example  6.9.  A  three-dimensional  swing  set  is  to  be  constructed,  consisting  of  two 

diagonal  supports  at  each  end  joined  by  a  horizontal  cross  bar.  Is  this  configuration  stable, 
i.e.,  can  a  child  swing  on  it  without  it  collapsing?  The  movable  joints  are  at  positions 

a1  =  (l,l,3)r,  a2  =  ( 4, 1, 3 )T  , 
while  the  four  fixed  supports  are  at 

a3  =  ( 0, 0, 0  )T  ,  a4  =  ( 0, 2, 0  )T  ,  a5  =  ( 5, 0, 0  )T  ,  a6  =  (5,2,0)T. 

The  reduced  incidence  matrix  for  the  structure  is  calculated  in  the  usual  manner: 
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For  instance,  the  first  three  entries  contained  in  the  first  row  refer  to  the  unit  vector 

in  the  direction  of  the  bar  going  from  a3  to  a:.  Suppose  the  five  bars 


ni  = 


al  a3 


a i 


a^ 


have  the  same  stiffness  cx  =  •  •  •  =  c5  =  1,  so  the  reduced  stiffness  matrix  for  the  structure 
is 


K*  =  ( A*)tA *  = 
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Solving  A* z*  =  0,  we  find  ker  A*  —  ker  K*  is  one-dimensional,  spanned  by 


z*  =  ( 3, 0,  —  1, 3, 0, 1  )T  • 


This  indicates  a  mechanism  that  can  cause  the  swing  set  to  collapse:  the  first  node  moves 
down  and  to  the  right,  while  the  second  node  moves  up  and  to  the  right,  the  horizontal 
motion  being  three  times  as  large  as  the  vertical.  The  swing  set  can  only  support  forces 
f  i  =  ( /i,  <71?  h1  )T,  f  2  =  ( /2,  g 25  ^2  )T  on  free  nodes  whose  combined  force  vector  f*  is 
orthogonal  to  the  mechanism  vector  z*,  and  so 

3  {f i  +  f2)  —  hx  +  h2  =  0. 

Otherwise,  a  reinforcing  bar,  say  from  node  1  to  node  6  (although  this  will  interfere  with 
the  swinging!)  or  another  bar  connecting  one  of  the  nodes  to  a  new  ground  support,  will 
be  required  to  completely  stabilize  the  swing. 

For  a  uniform  downwards  unit  vertical  force,  f  =  ( 0,  0,  —  1,  0,  0,  —  1  )T,  a  particular 

solution  to  (6.11)  is  u*  =  (  0,  —  |,  0,  0 )  and  the  general  solution  u  =  u*  +£z*  is 

obtained  by  adding  in  an  arbitrary  element  of  the  kernel.  The  resulting  forces/elongations 
are  uniquely  determined, 

T 

v  —  p  —  A*  n  —  A *  iT  —  l  —  —  Ail  _  l  _  yTT  _  VA  \ 

6,  6,  3,  6,  6  J  ’ 


so  that  every  bar  is  compressed,  the  middle  one  experiencing  slightly  more  than  half  the 
stress  of  the  outer  supports. 

If  we  add  in  two  vertical  supports  at  the  nodes,  as  in  Figure  6.19,  then  the  corresponding 
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reduced  incidence  matrix 
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has  trivial  kernel,  indicating  stabilization 

of  the 
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.  The  reduced  stiffness  matrix 
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is  now  only  slightly  different,  but  this  is  enough  to  make  it  positive  definite,  K *  >  0,  and  so 
allow  arbitrary  external  forcing  without  collapse.  Under  the  same  uniform  vertical  force, 
the  internal  forces  are 
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Note  the  overall  reductions  in  stress  in  the  original  bars;  the  two  reinforcing  vertical  bars 
are  now  experiencing  the  largest  compression. 


Further  developments  in  the  mathematical  analysis  of  structures  can  be  found  in  the 
references  [33,  79]. 


Exercises 

6.3.1.  If  a  bar  in  a  structure  compresses  2  cm  under  a  force  of  5  newtons  applied  to  a  node, 
how  far  will  it  compress  under  a  force  of  20  newtons  applied  at  the  same  node? 

6.3.2.  An  individual  bar  in  a  structure  experiences  a  stress  of  3  under  a  unit  horizontal  force 
applied  to  all  the  nodes  and  a  stress  of  —2  under  a  unit  vertical  force  applied  to  all  nodes. 
What  combinations  of  horizontal  and  vertical  forces  will  make  the  bar  stress-free? 

6.3.3.  (a)  For  the  reinforced  structure  illustrated  in  Figure  6.16,  determine  the  displacements  of 
the  nodes  and  the  stresses  in  the  bars  under  a  uniform  horizontal  force,  and  interpret 
physically,  (b)  Answer  the  same  question  for  the  doubly  reinforced  structure  in 

Figure  6.17. 

6.3.4.  Discuss  the  effect  of  a  uniform  horizontal  force  in  the  direction  of  the  horizontal  bar  on 
the  swing  set  and  its  reinforced  version  in  Example  6.9. 
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C  6.3.5.  All  the  bars  in  the  illustrated  square  planar  structure  have  unit 

stiffness,  (a)  Write  down  the  reduced  incidence  matrix  A.  ( b )  Write 
down  the  equilibrium  equations  for  the  structure  when  subjected  to 
external  forces  at  the  free  nodes,  (c)  Is  the  structure  stable?  statically 
determinate?  Explain  in  detail,  (d)  Find  a  set  of  external  forces 
with  the  property  that  the  upper  left  node  moves  horizontally,  while 
the  upper  right  node  stays  in  place.  Which  bar  is  under  the  most  stress? 

T  6.3.6.  In  the  square  structure  of  Exercise  6.3.5,  the  diagonal  struts  simply  cross  each  other.  We 
could  also  try  joining  them  at  an  additional  central  node.  Compare  the  stresses  in  the  two 
structures  under  a  uniform  horizontal  and  a  uniform  vertical  force  at  the  two  upper  nodes, 
and  discuss  what  you  observe. 


6.3.7.  (a)  Write  down  the  reduced  incidence  matrix  A *  for  the  pictured 
structure  with  4  bars  and  2  fixed  supports.  The  width  and  the 
height  of  the  vertical  sides  are  each  1  unit,  while  the  top  node 
is  1.5  units  above  the  base,  (b)  Predict  the  number  of  independent 
solutions  to  A *  u  =  0,  and  then  solve  to  describe  them  both 
numerically  and  geometrically,  (c)  What  condition (s)  must  be 
imposed  on  the  external  forces  to  maintain  equilibrium  in  the 
structure?  (d)  Add  in  just  enough  additional  bars  so  that  the 
resulting  reinforced  structure  has  only  the  trivial  solution  to 
A *  u  =  0.  Is  your  reinforced  structure  stable? 


C  6.3.8.  Consider  the  two-dimensional  “house”  constructed  out  of  bars, 
as  in  the  accompanying  picture.  The  bottom  nodes  are  fixed.  The 
width  of  the  house  is  3  units,  the  height  of  the  vertical  sides  1  unit, 
and  the  peak  is  1.5  units  above  the  base. 

(a)  Determine  the  reduced  incidence  matrix  A  for  this  structure. 

(b)  How  many  distinct  modes  of  instability  are  there?  Describe  them 
geometrically,  and  indicate  whether  they  are  mechanisms  or  rigid  motions. 

(c)  Suppose  we  apply  a  combination  of  forces  to  each  non-fixed  node  in  the  structure. 
Determine  conditions  such  that  the  structure  can  support  the  forces.  Write  down  an 
explicit  nonzero  set  of  external  forces  that  satisfy  these  conditions,  and  compute  the 
corresponding  elongations  of  the  individual  bars.  Which  bar  is  under  the  most  stress? 

(d)  Add  in  a  minimal  number  of  bars  so  that  the  resulting  structure  can  support  any 
force.  Before  starting,  decide,  from  general  principles,  how  many  bars  you  need  to  add. 

(e)  With  your  new  stable  configuration,  use  the  same  force  as  before,  and  recompute 
the  forces  on  the  individual  bars.  Which  bar  now  has  the  most  stress?  How  much  have 
you  reduced  the  maximal  stress  in  your  reinforced  building? 


4»  6.3.9.  Answer  Exercise  6.3.8  for  the  illustrated 

two-  and  three-dimensional  houses.  In  the  two- 
dimensional  case,  the  width  and  total  height 
of  the  vertical  bars  is  2  units,  and  the  peak 
is  an  additional  .5  unit  higher.  In  the  three- 
dimensional  house,  the  width  and  vertical  heights 
are  equal  to  1  unit,  the  length  is  3  units,  while 
the  peaks  are  1.5  units  above  the  base. 


T  6.3.10.  Consider  a  structure  consisting  of  three  bars  joined  in  a  vertical  line  hanging  from  a  top 
support,  (a)  Write  down  the  equilibrium  equations  for  this  system  when  only  forces  and 
displacements  in  the  vertical  direction  are  allowed,  i.e.,  a  one-dimensional  structure.  Is  the 
problem  statically  determinate,  statically  indeterminate,  or  unstable?  If  the  latter,  describe 
all  possible  mechanisms  and  the  constraints  on  the  forces  required  to  maintain  equilibrium, 
(b)  Answer  part  (a)  when  the  structure  is  two-dimensional,  i.e.,  is  allowed  to  move  in  a 
plane,  (c)  Answer  the  same  question  for  the  fully  three-dimensional  version. 
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4»  6.3.11.  A  space  station  is  built  in  the  shape  of  a  three-dimensional  simplex  whose  nodes  are  at 
the  positions  0,0^62,03  £  IR  ,  and  each  pair  of  nodes  is  connected  by  a  bar.  (a)  Sketch 

the  space  station  and  find  its  incidence  matrix  A.  (b)  Show  that  ker  A  is  six-dimensional, 
and  find  a  basis,  (c)  Explain  which  three  basis  vectors  correspond  to  rigid  translations. 

(d)  Find  three  basis  vectors  that  correspond  to  linear  approximations  to  rotations  around 
the  three  coordinate  axes,  (e)  Suppose  the  bars  all  have  unit  stiffness.  Compute  the  full 
stiffness  matrix  for  the  space  station,  (f)  What  constraints  on  external  forces  at  the  four 
nodes  are  required  to  maintain  equilibrium?  Can  you  interpret  them  physically?  (g)  How 
many  nodes  do  you  need  to  fix  to  stabilize  the  structure?  (h)  Suppose  you  fix  the  three 
nodes  in  the  xg-plane.  How  much  internal  force  does  each  bar  experience  under  a  unit 
vertical  force  on  the  upper  vertex? 

X  6.3.12.  Suppose  a  space  station  is  built  in  the  shape  of  a  regular  tetrahedron  with  all  sides  of 
unit  length.  Answer  all  questions  in  Exercise  6.3.11. 

W  6.3.13.  A  mass-spring  ring  consists  of  n  masses  connected  in  a  circle  by  n  identical  springs, 

and  the  masses  are  allowed  only  to  move  in  the  angular  direction,  (a)  Derive  the  equations 
of  equilibrium,  (b)  Discuss  stability,  and  characterize  the  external  forces  that  will  maintain 
equilibrium,  (c)  Find  such  a  set  of  nonzero  external  forces  in  the  case  of  a  four-mass 
ring  and  solve  the  equilibrium  equations.  What  does  the  nonuniqueness  of  the  solution 
represent? 

6.3.14.  A  structure  in  IR3  has  n  movable  nodes,  admits  no  rigid  motions,  and  is  statically 
determinate,  (a)  How  many  bars  must  it  have?  (b)  Find  an  example  with  n  =  3. 

0  6.3.15.  Prove  that  if  we  apply  a  unit  force  to  node  i  in  a  structure  and  measure  the 

displacement  of  node  j  in  the  direction  of  the  force,  then  we  obtain  the  same  value  if 
we  apply  the  force  to  node  j  and  measure  the  displacement  at  node  i  in  the  same  direction. 
Hint :  First,  solve  Exercise  6.1.6. 

6.3.16.  True  or  false:  A  structure  in  IR3  will  admit  no  rigid  motions  if  and  only  if  at  least  3 
nodes  are  fixed. 


6.3.17.  Suppose  all  bars  have  unit  stiffness.  Explain  why  the  internal  forces  in  a  structure  form 


the  solution  of  minimal  Euclidean  norm  among  all  solutions  to  AT y  =  f. 


0  6.3.18.  Let  A  be  the  reduced  incidence  matrix  for  a  structure  and  C  the  diagonal  bar  stiffness 
matrix.  Suppose  f  is  a  set  of  external  forces  that  maintain  equilibrium  of  the  structure. 

m 

(a)  Prove  that  f  =  A  C  g  for  some  g.  (b)  Prove  that  an  allowable  displacement  u 
is  a  least  squares  solution  to  the  system  Au  =  g  with  respect  to  the  weighted  norm 


=  vTCv. 


C  6.3.19.  Suppose  an  unstable  structure  admits  no  rigid  motions  —  only  mechanisms.  Let  f  be 
an  external  force  on  the  structure  that  maintains  equilibrium.  Suppose  that  you  stabilize 
the  structure  by  adding  in  the  minimal  number  of  reinforcing  bars.  Prove  that  the  given 
force  f  induces  the  same  stresses  in  the  original  bars,  while  the  reinforcing  bars  experience 
no  stress.  Are  the  displacements  necessarily  the  same?  Does  the  result  continue  to  hold 
when  more  reinforcing  bars  are  added  to  the  structure?  Hint :  Use  Exercise  6.3.18. 


??  6.3.20.  When  a  node  is  fixed  to  a  roller ,  it  is  permitted  to  move  only  along  a  straight  line  — 
the  direction  of  the  roller.  Consider  the  three-bar  structure  in  Example  6.5.  Suppose  node 
1  is  fixed,  but  node  4  is  attached  to  a  roller  that  permits  it  to  move  only  in  the  horizontal 
direction,  (a)  Construct  the  reduced  incidence  matrix  and  the  equilibrium  equations  in  this 
situation.  You  should  have  a  system  of  5  equations  in  5  unknowns  —  the  horizontal  and 
vertical  displacements  of  nodes  2  and  3  and  the  horizontal  displacement  of  node  4.  (b)  Is 
your  structure  stable?  If  not,  how  many  rigid  motions  and  how  many  mechanisms  does  it 
permit? 


C  6.3.21.  Answer  Exercise  6.3.20  when  the  roller  at  node  4  allows  it  to  move  in  only  the  vertical 
direction. 
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*s?  6.3.22.  Redo  Exercises  6.3.20-21  for  the  reinforced  structure  in  Figure  6.16. 

6.3.23.  (a)  Suppose  that  we  fix  one  node  in  a  planar  structure  and  put  a  second  node  on  a 
roller.  Does  the  structure  admit  any  rigid  motions?  (b)  How  many  rollers  are  needed  to 
prevent  all  rigid  motions  in  a  three-dimensional  structure?  Are  there  any  restrictions  on  the 
directions  of  the  rollers? 

6.3.24.  True  or  false:  If  a  structure  is  statically  indeterminate,  then  every  non-zero  applied 
force  will  result  in  (a)  one  or  more  nodes  having  a  non-zero  displacement;  (b)  one  or  more 
bars  having  a  non-zero  elongation. 

6.3.25.  True  or  false:  If  a  structure  constructed  out  of  bars  with  identical  stiffnesses  is  stable, 
then  the  same  structure  constructed  out  of  bars  with  differing  stiffnesses  is  also  stable. 


® 
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Chapter  7 
Linearity 


We  began  this  book  by  learning  how  to  systematically  solve  systems  of  linear  algebraic 
equations.  This  “elementary”  problem  formed  our  launching  pad  for  developing  the  fun¬ 
damentals  of  linear  algebra.  In  its  initial  form,  matrices  and  vectors  were  the  primary 
focus  of  our  study,  but  the  theory  was  developed  in  a  sufficiently  general  and  abstract  form 
that  it  can  be  immediately  used  in  many  other  useful  situations  —  particularly  infinite¬ 
dimensional  function  spaces.  Indeed,  applied  mathematics  deals,  not  just  with  algebraic 
equations,  but  also  with  differential  equations,  difference  equations,  integral  equations, 
stochastic  systems,  differential  delay  equations,  control  systems,  and  many  other  types  - 
only  a  few  of  which,  unfortunately,  can  be  adequately  developed  in  this  introductory  text. 
It  is  now  time  to  assemble  what  we  have  learned  about  linear  algebraic  systems  and  place 
the  results  in  a  suitably  general  framework  that  will  lead  to  insight  into  the  key  principles 
that  govern  all  linear  systems  arising  in  mathematics  and  its  applications. 

The  most  basic  underlying  object  of  linear  systems  theory  is  the  vector  space,  and  we 
have  already  seen  that  the  elements  of  vector  spaces  can  be  vectors,  or  functions,  or  even 
vector-valued  functions.  The  seminal  ideas  of  span,  linear  independence,  basis,  and  dimen¬ 
sion  are  equally  applicable  and  equally  vital  in  more  general  contexts,  particularly  function 
spaces.  Just  as  vectors  in  Euclidean  space  are  prototypes  for  elements  of  general  vector 
spaces,  matrices  are  also  prototypes  for  more  general  objects,  known  as  linear  functions. 
Linear  functions  are  also  known  as  linear  maps  or,  when  one  is  dealing  with  function  spaces, 
linear  operators,  and  include  linear  differential  operators,  linear  integral  operators,  func¬ 
tion  evaluation,  and  many  other  basic  operations.  Linear  operators  on  infinite-dimensional 
function  spaces  are  the  basic  objects  of  quantum  mechanics.  Each  quantum  mechanical 
observable  (mass,  energy,  momentum)  is  formulated  as  a  linear  operator  on  an  infinite¬ 
dimensional  Hilbert  space  the  space  of  wave  functions  or  states  of  the  system,  [54]. 
It  is  remarkable  that  quantum  mechanics  is  an  entirely  linear  theory,  whereas  classical 
and  relativistic  mechanics  are  inherently  nonlinear.  The  holy  grail  of  modern  physics 
the  unification  of  general  relativity  and  quantum  mechanics  —  is  to  resolve  the  apparent 
incompatibility  of  the  microscopic  linear  and  macroscopic  nonlinear  physical  regimes. 

In  geometry,  linear  functions  are  interpreted  as  linear  transformations  of  space  (or  space- 
time),  and,  as  such,  lie  at  the  foundations  of  motion  of  bodies,  such  as  satellites  and  planets; 
computer  graphics  and  games;  video,  animation,  and  movies;  and  the  mathematical  for¬ 
mulation  of  symmetry.  Many  familiar  geometrical  transformations,  including  rotations, 
scalings  and  stretches,  reflections,  projections,  shears,  and  screw  motions,  are  linear.  But 
including  translational  motions  requires  a  slight  extension  of  linearity,  known  as  an  affine 
transformation.  The  basic  geometry  of  linear  and  affine  transformations  will  be  developed 
in  Section  7.2. 

Linear  functions  form  the  simplest  class  of  functions  on  vector  spaces,  and  must  be 
thoroughly  understood  before  any  serious  progress  can  be  made  in  the  vastly  more  com¬ 
plicated  nonlinear  world.  Indeed,  nonlinear  functions  are  often  approximated  by  linear 
functions,  generalizing  the  calculus  approximation  of  a  scalar  function  by  its  tangent  line. 
This  linearization  process  is  applied  to  nonlinear  functions  of  several  variables  studied  in 
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multivariable  calculus,  as  well  as  the  nonlinear  systems  arising  in  physics  and  mechanics, 
which  can  often  be  well  approximated  by  linear  differential  equations. 

A  linear  system  is  just  an  equation  formed  by  a  linear  function.  The  most  basic  linear 
system  is  a  system  of  linear  algebraic  equations.  Linear  systems  theory  includes  linear 
differential  equations,  linear  boundary  value  problems,  linear  integral  equations,  and  so 
on,  all  in  a  common  conceptual  framework.  The  fundamental  ideas  of  linear  superposition 
and  the  relation  between  the  solutions  to  inhomogeneous  and  homogeneous  systems  are 
universally  applicable  to  all  linear  systems.  You  have  no  doubt  encountered  many  of  these 
concepts  in  your  study  of  elementary  ordinary  differential  equations.  In  this  text,  they  have 
already  appeared  in  our  discussion  of  the  solutions  to  linear  algebraic  systems.  The  final 
section  introduces  the  notion  of  the  adjoint  of  a  linear  map  between  inner  product  spaces, 
generalizing  the  transpose  operation  on  matrices,  the  notion  of  a  positive  definite  linear 
operator,  and  the  characterization  of  the  solution  to  such  a  linear  system  by  a  minimization 
principle.  The  full  import  of  these  fundamental  concepts  will  appear  in  the  context  of  linear 
boundary  value  problems  and  partial  differential  equations,  [61]. 


7.1  Linear  Functions 


We  begin  our  study  of  linear  functions  with  the  basic  definition.  For  simplicity,  we  shall 
concentrate  on  real  linear  functions  between  real  vector  spaces.  Extending  the  concepts 
and  constructions  to  complex  linear  functions  on  complex  vector  spaces  is  not  difficult,  and 
will  be  dealt  with  in  due  course. 

Definition  7.1.  Let  V  and  W  be  real  vector  spaces.  A  function  L:  V  —>  W  is  called  linear 
if  it  obeys  two  basic  rules: 


L[v  +  w]  =  L[v]  +  L[w], 


L[cv]  =  cL[v], 


(7.1) 


for  all  v,  w  E  V  and  all  scalars  c.  We  will  call  V  the  domain  and  W  the  codomain ^  for  L. 

In  particular,  setting  c  =  0  in  the  second  condition  implies  that  a  linear  function  always 
maps  the  zero  element  0  E  V  to  the  zero  element  ^  0  E  W ,  so 


L[0]  =  0. 

We  can  readily  combine  the  two  defining  conditions  (7.1)  into  a  single  rule 


(7.2) 


L[c  v  +  d  w]  =  cL[y]  +  dL[  w], 


for  all 


v,  w  E  V,  c,  d  E  R, 


(7.3) 


that  characterizes  linearity  of  a  function  L.  An  easy  induction  proves  that  a  linear  function 
respects  linear  combinations,  so 


Ucivi  +  •••  +ckvk]  =  CiL[vx]  +  •••  +  ck  L[vk 


(7.4) 


for  all  c1 , . . . ,  ck  Gl  and  v1? . . . ,  vfc  E  VL 

The  interchangeable  terms  linear  map ,  linear  operator ,  and,  when  V  =  W,  linear  trans¬ 
formation  are  all  commonly  used  as  alternatives  to  “linear  function”,  depending  on  the 


^  The  terms  “range”  and  “target”  are  also  sometimes  used  for  the  codomain.  However,  some 
authors  use  “range”  to  mean  the  image  of  L.  An  alternative  name  for  domain  is  “source”. 

^  We  will  use  the  same  notation  for  these  two  zero  elements  even  though  they  may  belong  to 
different  vector  spaces.  The  reader  should  be  able  to  determine  where  each  lives  from  the  context. 
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circumstances  and  taste  of  the  author.  The  term  “linear  operator”  is  particularly  useful 
when  the  underlying  vector  space  is  a  function  space,  so  as  to  avoid  confusing  the  two 
different  uses  of  the  word  “function”.  As  usual,  we  will  often  refer  to  the  elements  of  a 
vector  space  as  “vectors”,  even  though  they  might  be  functions  or  matrices  or  something 
else,  depending  on  the  context. 


Example  7.2.  The  simplest  linear  function  is  the  zero  function  0[v]  =  0,  which  maps 

every  element  v  E  V  to  the  zero  vector  in  W.  Note  that,  in  view  of  (7.2),  this  is  the  only 
constant  linear  function;  a  nonzero  constant  function  is  not ,  despite  its  evident  simplicity, 
linear.  Another  simple  but  important  linear  function  is  the  identity  function  I  =  I  v:V 
V ,  which  maps  V  to  itself  and  leaves  every  vector  unchanged:  I  [v 


=  v. 


Slightly  more 

generally,  the  operation  of  scalar  multiplication  Ma[v]  —aw  by  a  scalar  a  E  R  defines 
a  linear  function  from  V  to  itself,  with  M0  =  O,  the  zero  function  from  V  to  itself,  and 
M1  =  I ,  the  identity  function  on  V ,  appearing  as  special  cases. 


Example  7.3.  Suppose  V 
form 


R.  We  claim  that  every  linear  function  L :  R  — >•  R  has  the 


y  =  L[x 


ax , 


for  some  constant  a.  Therefore,  the  only  scalar  linear  functions  are  those  whose  graph  is  a 
straight  line  passing  through  the  origin.  To  prove  this,  we  write  x  E  R  as  a  scalar  product 
x  =  x  •  1.  Then,  by  the  second  property  in  (7.1), 


L[x  •  1 


=  x 


ax , 


where  a  =  L[  1], 


as  claimed. 

Warning.  Even  though  the  graph  of  the  function 

y  —  ax  +  6,  (7-5) 

is  a  straight  line,  it  is  not  a  linear  function  —  unless  6  =  0,  so  the  line  goes  through  the 
origin.  The  proper  mathematical  name  for  a  function  of  the  form  (7.5)  is  an  affine  function, 
see  Definition  7.21  below. 

Let  V  —  Rn  and  If  =  Rm.  Let  A  be  an  m  x  n  matrix.  Then  the 

Aw  given  by  matrix  multiplication  is  easily  seen  to  be  a  linear  function. 
Indeed,  the  requirements  (7.1)  reduce  to  the  basic  distributivity  and  scalar  multiplication 
properties  of  matrix  multiplication: 

A(w  +  w)  =  Aw  +  A  w,  A(cw)  =  cAw,  for  all  v,wERn,  c  E  R. 

In  fact,  every  linear  function  between  two  Euclidean  spaces  has  this  form. 


Example  7.4. 

function  L[w]  — 


Theorem  7.5.  Every  linear  function  L  :  R 
L[w]  —  Aw,  where  A  is  an  m  x  n  matrix. 


n 


— »>  Rm  is  given  by  matrix  multiplication, 


Warning.  Pay  attention  to  the  order  of  m  and  n.  While  A  has  size  m  x  n,  the  linear 
function  L  goes  from  Rn  to  Rm. 

Proof :  The  key  idea  is  to  look  at  what  the  linear  function  does  to  the  basis  vectors.  Let 
e1? . . . ,  en  be  the  standard  basis  of  Rn,  as  in  (2.17),  and  let  e1? . . . ,  em  be  the  standard 
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v  +  w 


Figure  7.1.  Linear  Function  on  Euclidean  Space. 


basis  of  Mm.  (We  temporarily  place  hats  on  the  latter  to  avoid  confusing  the  two.)  Since 
L 


E  Mm,  we  can  write  it  as  a  linear  combination  of  the  latter  basis  vectors: 

( ai A 


L[ej]=a-  = 


3 


a2  j 


~  al  j  el  +  a2 j  e2  + 


-1 -  (1  .  p 

1  mj  rri  ’ 


j  =  l,...,n.  (7.6) 


\  amj  / 

Let  us  construct  the  m  x  n  matrix 


A  =  ( &1  a2  . . .  an  )  = 


/  all 

a12 

aln  \ 

a21 

a22 

•  • 

•  • 

P 

•  •  •  to 

3 

(7.7) 

\  (lrn\ 

^m2 

CL  ) 

ran 

whose  columns  are  the  image  vectors  (7.6).  Using  (7.4),  we  then  compute  the  effect  of  L 


T 

on  a  general  vector  v  =  ( rq,  v2, . . . ,  vn  )  E  Mn: 


r[v]=L[wie1  +  ---  +  vnen 


=  v1L 


+ - 1"  ^n-Men 


=  »1a1  +  -"+«„a„  =  iv. 


The  final  equality  follows  from  our  basic  formula  (2.13)  connecting  matrix  multiplication 
and  linear  combinations.  We  conclude  that  the  vector  L[v]  coincides  with  the  vector  Aw 
obtained  by  multiplying  v  by  the  coefficient  matrix  A.  Q.E.D. 


The  proof  of  Theorem  7.5  shows  us  how  to  construct  the  matrix  representative  of  a 
given  linear  function  L:Mn  -4-  Mm.  We  merely  assemble  the  image  column  vectors  a:  = 


L 


an  =  L[en]  into  an  m  x  n  matrix  A. 


The  two  basic  linearity  conditions  (7.1)  have  a  simple  geometrical  interpretation.  Since 
vector  addition  is  the  same  as  completing  the  parallelogram  sketched  in  Figure  7.1,  the 
first  linearity  condition  requires  that  L  map  parallelograms  to  parallelograms.  The  second 
linearity  condition  says  that  if  we  stretch  a  vector  by  a  factor  c,  then  its  image  under  L 
must  also  be  stretched  by  the  same  amount.  Thus,  one  can  often  detect  linearity  by  simply 
looking  at  the  geometry  of  the  function. 


Example  7.6.  As  a  specific  example,  consider  the  function  Re:M2  M2  that  rotates 

the  vectors  in  the  plane  around  the  origin  by  a  specified  angle  9.  This  geometric  trans¬ 
formation  clearly  preserves  parallelograms  —  see  Figure  7.2.  It  also  respects  stretching 
of  vectors,  and  hence  defines  a  linear  function.  In  order  to  find  its  matrix  representative, 


7.1  Linear  Functions 


345 


i^[v  +  w 


R 


el 


Figure  7.3.  Rotation  in  M2. 


we  need  to  find  out  where  the  standard  basis  vectors  e1,e2  are  mapped.  Referring  to 
Figure  7.3,  and  keeping  in  mind  that  the  rotated  vectors  also  have  unit  length,  we  have 


(cos (9)  e-L  +  (sin#)  e2 


/  cos#\ 

y  sin  9  J  ’ 


(sin#)  e:  +  (cos#)  e2 


According  to  the  general  recipe  (7.7),  we  assemble  these  two  column  vectors  to  obtain  the 
matrix  form  of  the  rotation  transformation,  and  so 


=  Av> 


where 


/  cos# 
l  sin  # 


—  sin  # 
cos  # 


Therefore,  rotating  a  vector  v 


through  angle  #  produces  the  vector 


v  =  R 


el 


—  Aq  v  — 


cos  #  —  sin  # 
sin  #  cos  # 


/  x  cos  #  —  y  sin  # 
y  x  sin  #  +  y  cos  # 


with  coordinates  x  =  x  cos  #  —  y  sin  #,  y  =  x  sin  #  +  y  cos  #.  These  formulas  can  be  proved 
directly,  but,  in  fact,  are  a  consequence  of  the  underlying  linearity  of  rotations. 
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Exercises 


7.1.1.  Which  of  the  following  functions  F: 


are  linear?  (a)  F(x ,  y:  z)  =  x. 


(b)  F{x,  y,  z)  =  y  -  2,  (c)  F(a;,  y,  2)  =  a;  +  y  +  3,  (d)  F{x,y,z)  =  x  -  y  -  z, 
(e)  F(x,  y,  z)  =  xyz,  (f)  F(x,  y,  z)  =  x2  -  y2  +  z2,  (g)  F(x,  y,  z)  =  ex~v+z. 


7.1.2.  Explain  why  the  following  functions  F: 


— >  Rz  are  not  linear. 


(a) 


x  -f  2 
x  +  y 


(b) 


X 

y‘ 


(c) 


y 

X 


7.1.3.  Which  of  the  following  functions  F: 


(d) 

2  are  linear? 


sin(x  +  y) 
x-y 


(e) 


x  +  ey 
2  x  +  y 


(a)  F 
(d)  F 


=  I  x~y 
y  j  \x  +  y 

x\  =  (  3  v 

y  1 2* 


(b)  F 


y 


(e)  F 


X 

y 


_  (  x  +y 
2  2 


x  +  y  +  1 
x  —  y  —  1 
2 


(c)  F 


X 

y 


xy 

x-y 


(f)  F 


x 

y 


y  —  3x 
x 


o  9 

7.1.4.  Explain  why  the  translation  function  T:R  — R  ,  defined  by  T 
a,  b  E  R,  is  almost  never  linear.  Precisely  when  is  it  linear? 


x 

2/ 


x  +  a 
y  +  b 


for 


7.1.5.  Find  a  matrix  representation  for  the  following  linear  transformations  on 


vO 


(a)  counterclockwise  rotation  by  90°  around  the  2:-axis;  (b)  clockwise  rotation  by  60( 
around  the  x-axis;  (c)  reflection  through  the  (x,  y) -plane;  (d)  counterclockwise  rotation 
by  120°  around  the  line  x  =  y  =  z\  (e)  rotation  by  180°  around  the  line  x  =  y  =  z\ 

(f)  orthogonal  projection  onto  the  xy-plane;  (g)  orthogonal  projection  onto  the  plane 
x  —  y  +  2z  =  0. 


7.1.6.  Find  a  linear  function  L: 


7.1.7.  Find  a  linear  function  L: 


such  that  L 


=  2  and  L 


1 

1 


=  3.  Is  it  unique? 


2  such  that  L  f  ^  I  — 


^  j  and  L  f  ^ 


0 

-1 


7.1.8.  Under  what  conditions  does  there  exist  a  linear  function  L: 


9 

such  that 


L 


x 


1  )  =  (  ?1  )  and  L 


x 


2  - 


7  ,  ^  ,  i  =  f  ?2  J  ?  Under  what  conditions  is  L  uniquely  defined?  In 

VlJ  \bl)  V  V2  J  V  b2  ) 

the  latter  case,  write  down  the  matrix  representation  of  L. 


j 

7.1.9.  Can  you  construct  a  linear  function  L:R  — >  R  such  that 


(  b 

(  b 

(  ^ 

L 

-i 

=  1,  L 

0 

=  4,  and  L 

1 

'v  o) 

Ub 

=  —  2?  If  yes,  find  one.  If  not,  explain  why  not. 


rj~i  q 

0  7.1.10.  Given  a  =  (a,  6,  c)  £  R  ,  prove  that  the  cross  product  map  La[v]  =  a  x  v,  as  defined 
in  (4.2),  is  linear,  and  find  its  matrix  representative. 

7.1.11.  Is  the  Euclidean  norm  function  iV(v)  =  ||  v||,  for  v  £  Rn,  linear? 

7.1.12.  Let  V  be  a  vector  space.  Prove  that  every  linear  function  L:  R  £  7  has  the  form 
L[x]  =  xb,  where  x  £  R,  for  some  b  £  V. 

7.1.13.  True  or  false:  The  quadratic  form  Q(v)  =  wT Kw  defined  by  a  symmetric  n  x  n  matrix 
K  defines  a  linear  function  Q:  Rn  — >  R. 

0  7.1.14.  (a)  Prove  that  L  is  linear  if  and  only  if  it  satisfies  (7.3). 

(b)  Use  induction  to  prove  that  L  satisfies  (7.4). 
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7.1.15.  Let  A  = 


a 

c 


-  f  P 


^  be  2  x  2  matrices.  For  each  of  the  following  functions, 
prove  that  L:  M2x2  *^2x2  defines  a  linear  map,  and  then  find  its  matrix  representative 


b 

d 


1 

0 


0 

0 


0 

0 


1 

0 


0 

1 


0 

0 


0 

0 


0 

1 


with  respect  to  the  standard  basis 

(a)  L[X]=AX,  (b)  R[X]=XB,  (c)  K[X ]  =  AX B. 


of  M 


2x2- 


7.1.16.  The  domain  space  of  the  following  functions  is  the  space  of  n  x  n  real  matrices  A. 
Which  are  linear?  What  is  the  codomain  space  in  each  case?  (a)  L[A]  =  3  A; 

(b)  L[A\  =  I  -  A;  (c)  L[A\  =  AT;  (cl)  L[A\  =  A~x;  (e)  L[A\  =  det  A;  (f)  L[A]  =  tr  A; 

(g)  L[A]  =  (a11? . . . ,  ann  )  ,  i.e.,  the  vector  of  diagonal  entries  of  A; 

( h )  L[A]  =  4v,  where  v  G  IRn;  (i)  L[A]  =  vT4v,  where  v  G  IRn. 

0  7.1.17.  Let  v1? . . . ,  vn  be  a  basis  of  V  and  w1? . . . ,  wn  be  any  vectors  in  W.  Show  that  there  is 
a  unique  linear  function  L:V  — >•  W  such  that  L[v^]  =  w^,  i  =  1, . . . ,  n. 

T  7.1.18.  Bilinear  functions:  Let  V,  W,  Z  be  vector  spaces.  A  function  that  takes  any  pair  of 

vectors  v  G  V  and  w  G  W  to  a  vector  z  =  R[v,  w]  G  Z  is  called  bilinear  if,  for  each 

fixed  w,  it  is  a  linear  function  of  v,  so  B[cw  +  dv,  w]  =  cF>[v,  w]  +  d5[v,w],  and,  for 
each  fixed  v,  it  is  a  linear  function  of  w,  so  F>[v,  cw  +  dw]  =  cB[v,  w]  +  d5[v,  w]. 

Thus,  B:V  xW^Z  defines  a  function  on  the  Cartesian  product  space  V  x  W,  as  defined 
in  Exercise  2.1.13.  (a)  Show  that  F>[v,  w]  =  —  2v2W2  is  a  bilinear  function  from 

IR2  x  IR2  to  IR.  (b)  Show  that  F>[v,w]  =  2v1w2  —  3  v2w^  is  a  bilinear  function  from 
IR  x  IR  to  IR.  (c)  Show  that  if  V  is  an  inner  product  space,  then  F>[v,w]  =  (v,w) 
defines  a  bilinear  function  B:  V  x  V  — >  IR.  (d)  Show  that  if  A  is  any  m  x  n  matrix,  then 
F>[v,w]  =  vTlw  defines  a  bilinear  function  F>:IRm  x  IRn  IR.  (e)  Show  that  every 
bilinear  function  B:  IRm  x  IRn  — >  IR  arises  in  this  way.  (f)  Show  that  a  vector- valued 
function  B:  IRm  x  IRn  — >  IRfc  defines  a  bilinear  function  if  and  only  if  each  of  its  components 
B-R™  x  IRn  — >  IR,  for  i  =  1, . . . ,  k,  is  a  bilinear  function,  (g)  True  or  false:  A  bilinear 
function  B:V  x  W  — >•  Z  defines  a  linear  function  on  the  Cartesian  product  space. 


Linear  Operators 


So  far,  we  have  concentrated  on  linear  functions  on  Euclidean  space,  and  discovered  that 
they  are  all  represented  by  matrices.  For  function  spaces,  there  is  a  much  wider  variety  of 
linear  operators  available,  and  a  complete  classification  is  out  of  the  question.  Let  us  look 
at  some  of  the  main  representative  examples  that  arise  in  applications. 

Example  7.7.  (a)  Recall  that  C°[a,  b]  denotes  the  vector  space  consisting  of  all  con¬ 
tinuous  functions  on  the  interval  [a,  b].  Evaluation  of  the  function  at  a  point,  namely 
L[f]  =  f(pc 0),  defines  a  linear  operator  L:  C°[a,  b]  — ^  M,  because 


L[cf  +  dg]  =  ( cf  +  dg)(x0)  =  cf(x0 )  +  dg(x0)  =  cL[f }  +dL[g } 

for  any  functions  f,g  €  C°[a,  b]  and  scalars  (constants)  c,d. 

(b)  Another  real- valued  linear  function  is  the  integration  operator 

I[f]  =  [  f(x)dx ,  (7.9) 

J  a 

that  maps  /:C°[a,  b]  -T  IR.  Linearity  of  /  is  an  immediate  consequence  of  the  basic  inte¬ 
gration  identity 

>b  pb  pb 

cf(x)-\-dg(x)  dx  =  c  f(x)dx  +  d  g(x)dx. 


a 


a 


a 
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which  is  valid  for  arbitrary  integrable  —  which  includes  continuous  —  functions  /,  g  and 
constants  c,  d. 

(c)  We  have  already  seen  that  multiplication  of  functions  by  a  constant,  Mc[f(x)]  = 
cf(x),  defines  a  linear  map  Mc\  C°[a,  b]  C°[a,  b];  the  particular  case  c  =  1  reduces  to  the 
identity  transformation  I  =  Mv  More  generally,  if  a(x)  E  C°[a,  b]  is  a  given  continuous 
function,  then  the  operation  Ma[f(x)]  —  a(x)  f(x)  of  multiplication  by  a  also  defines  a 
linear  transformation  Ma :  C°[a,  b]  — )>  C°[a,  b]. 

(d)  Another  important  linear  transformation  is  the  indehnite  integral 


‘X 


J[f]=9,  where  g(x)  =  /  f(y)dy. 


(7.10) 


a 


According  to  the  Fundamental  Theorem  of  Calculus,  [2,  78],  the  integral  of  a  continuous 
function  is  continuously  differentiable.  Therefore,  J:C°[a,  b]  C1[a,  b]  dehnes  a  linear 
operator  from  the  space  of  continuous  functions  to  the  space  of  continuously  differentiable 
functions. 

(e)  Conversely,  differentiation  of  functions  is  also  a  linear  operation.  To  be  precise, 
since  not  every  continuous  function  can  be  differentiated,  we  take  the  domain  space  to  be 
the  vector  space  C1[a,  b }  of  continuously  differentiable  functions  on  the  interval  [a,  b].  The 
derivative  operator 

D[f]  =  f  (7.11) 

dehnes  a  linear  operator  D:  C1[a,  b]  — »>  C°[a,  b].  This  follows  from  the  elementary  differen¬ 
tiation  formula 


D[cf  +  dg]  =  (cf  +  dg)'  =  cf  +  dg'  =  cD[f }  +  dD[g ], 
valid  whenever  c,  d  are  constant. 


Exercises 

7.1.19.  Which  of  the  following  define  linear  operators  on  the  vector  space  C1(R)  of 
continuously  differentiable  scalar  functions?  What  is  the  codomain? 

(a)  L[f]  =  /(0)  +  /(1),  (b)  L[f]  =  /(0)/(l),  (c)  L[f}  =  f'(  1),  (d)  L[f]  =  f'(3)-f(2), 
(e)  L[f\  =  x2  f(x),  (f)  L[f]  =  f(x  +  2),  (g)  L[f]  =  /(re)  +  2,  (h)  L[f }  =  /'( 2 re), 

0)  Hf]  =  O')  Llf]  =  f(x) sin  x  -  f'(x)  cos  x,  (k)  L[f]  =  21og/(0), 

0)  L[f]  =  f  ey  f{y)dy,  (m)  L[f]  =  f  \f(y)\dy,  (n)  L[f]  =  [  +  f(y)dy, 

(o)  L[f]  =  f  thUdy,  (p)  L[f]  =  ff{  )  ydy,  (q)  L[f]  =  f  y2  f{y)dy, 

Jx  y  JO  JO 

M  L[f]  =  f_x  [f(y)  -  / (0) ]  dy,  (s)  L[f]  =  /_1  \f{y)  -  y]  dy. 

1  [h 

7.1.20.  True  or  false:  The  average  or  mean  A[f]  =  - -  /  f{pc)dx  of  a  function  on  the 

b  —  a  J  a 

interval  [a,  b]  dehnes  a  linear  operator  A:  C °[a,  b]  R. 

7.1.21.  Prove  that  multiplication  Mh[f{x)]  =  h(pc)  f(x)  by  a  given  function  h  E  Cn[a,6]  dehnes 
a  linear  operator  Mh:  Cn[a,6]  — >  Cn[a,6].  Which  result  from  calculus  do  you  need  to 
complete  the  proof? 
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7.1.22.  Show  that  if  w(pc)  is  any  continuous  function,  then  the  weighted  integral 

rb  q 

/  [/]  =  /  f(x)w(pc)  dx  defines  a  linear  operator  Iw:  C  [a,b]  — >  R. 

J  a 

df  df 

7.1.23.  (a)  Show  that  the  partial  derivatives  dAf]  =  7—  and  d„\f }  =  7—  both  define  linear 

ox  y  oy 

operators  on  the  space  of  continuously  differentiable  functions  f(x,y). 

df  df 

(b)  For  which  values  of  a,  6,  c,  d  is  the  map  L[f)  =  a  — — b  b  — — b  cf  +  d  linear? 


dx 

d2f  ,  d2f 


dy 


defines  a  linear  function  on  the 


7.1.24.  Prove  that  the  Laplacian  operator  A[/l  =  7^7  +  -7-^- 

dxz  dyz 

vector  space  of  twice  continuously  differentiable  functions  f(x,y). 

7.1.25.  Show  that  the  gradient  G[f]  =  V/  defines  a  linear  operator  from  the  space  of 


continuously  differentiable  scalar- valued  functions  /:  — > 

vector  fields  v:  R2  — >  R2. 


to  the  space  of  continuous 


0 

7.1.26.  Prove  that,  on  R  ,  the  gradient,  curl,  and  divergence  all  define  linear  operators.  Be 
precise  in  your  description  of  the  domain  space  and  the  codomain  space  in  each  case. 


The  Space  of  Linear  Functions 

Given  two  vector  spaces  V,W,  we  use  jC(V,W)  to  denote  the  set  of  all’*’  linear  functions 
L:V  — b  W.  We  claim  that  £(V,W)  is  itself  a  vector  space.  We  add  linear  functions 
L,  M  E  £(F,  W)  in  the  same  way  we  add  general  functions: 


(L  +  M)[v]  =  L[v]  +M[v 


You  should  check  that  L  +  M  satisfies  the  linear  function  axioms  (7.1),  provided  that  L 
and  M  do.  Similarly,  multiplication  of  a  linear  function  by  a  scalar  c  E  R  is  defined  so 
that  (cL)[v]  =  cL[v],  again  producing  a  linear  function.  The  zero  element  of  £(V,  W)  is 
the  zero  function  0[v]  =  0.  The  verification  that  jC(V,W)  satisfies  the  basic  vector  space 
axioms  of  Definition  2.1  is  left  to  the  reader. 

In  particular,  if  V  —  Mn  and  W  =  Mm,  then  Theorem  7.5  implies  that  we  can  identify 
£(Mn,Mm)  with  the  space  A4mXn  of  all  mn  x  n  matrices.  Addition  of  linear  functions 
corresponds  to  matrix  addition,  while  scalar  multiplication  coincides  with  the  usual  scalar 
multiplication  of  matrices.  (Why?)  Therefore,  the  space  of  all  m  x  n  matrices  is  a  vector 
space  —  a  fact  we  already  knew.  The  standard  basis  for  A4mXn  is  given  by  the  m  n  matrices 
E%- ,  1  <  i  <  m,  1  <  j  <  n,  which  have  a  single  1  in  the  (i,j)  position  and  zeros  everywhere 
else.  Therefore,  the  dimension  of  7WmXn  is  mn.  Note  that  Ei-  corresponds  to  the  specific 

linear  transformation  that  maps  F-  [e  ]  =  ei:  while  Eij[ek]  —  0  whenever  k  7^  j. 


Example  7.8.  The  space  of  linear  transformations  of  the  plane,  £(M2,  M2 ),  is  identified 
with  the  space  A42x2  of  2x2  matrices  A  —  (a  ^ 


The  standard  basis  of  A42x2  consists 


^  In  infinite-dimensional  situations,  one  usually  imposes  additional  restrictions,  e.g.,  continuity 
or  boundedness  of  the  linear  operators.  We  shall  relegate  these  more  subtle  distinctions  to  a  more 
advanced  treatment  of  the  subject.  See  [50,  67]  for  a  full  discussion  of  the  rather  sophisticated 
analytical  details,  which  play  an  important  role  in  serious  quantum  mechanical  applications. 
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of  the  4  =  2-2  matrices 


£n  = 


1  0 
0  0 


0  1 
0  0 


^21 


0  0 
1  0 


^22  — 


0  0 
0  1 


Indeed,  we  can  uniquely  write  any  other  matrix 

A  =  ^  ^  d^)  =  CL^11  +  ^^12  +  c^2l  +  ^^22> 

as  a  linear  combination  of  these  four  basis  matrices.  Of  course,  as  with  any  vector  space, 
this  is  but  one  of  many  other  possible  bases  of  £(R2,R2). 


Dual  Spaces 

A  particularly  important  case  is  that  in  which  the  codomain  of  the  linear  functions  is  R. 

Definition  7.9.  The  dual  space  to  a  vector  space  V  is  the  vector  space  R*  =  £(E,R) 
consisting  of  all  real- valued  linear  functions  £ :  V  R. 


If  V  —  Rn,  then,  by  Theorem  7.5,  every  linear  function  £:  Rn  — »>  R  is  given  by  multipli¬ 
cation  by  a  1  x  n  matrix,  i.e.,  a  row  vector.  Explicitly, 


av  =  fllh  +  *•*  +  anVm 


where  a  =  ( a1  a2  . . .  an  ), 


fv  A 

V2 

\VnJ 


Therefore,  we  can  identify  the  dual  space  (Rn)*  with  the  space  of  row  vectors  with  n  entries. 
In  light  of  this  observation,  the  distinction  between  row  vectors  and  column  vectors  is  now 
seen  to  be  much  more  sophisticated  than  mere  semantics  or  notation.  Row  vectors  should 
more  properly  be  viewed  as  real- valued  linear  functions  —  the  dual  objects  to  column 
vectors. 

The  standard  dual  basis  e1: . . . ,  en  of  (Rn)*  consists  of  the  standard  row  basis  vectors; 
namely,  eJ  is  the  row  vector  with  1  in  the  jth  slot  and  zeros  elsewhere.  The  jth  dual  basis 
element  defines  the  linear  function 


=  e  ■  v  = 

3 


which  picks  off  the  jth  coordinate  of  v  —  with  respect  to  the  original  basis  e1? . . .  ,en. 
Thus,  the  dimensions  of  V  =  Rn  and  its  dual  E*  =  (Rn)*  are  both  equal  to  n. 

An  inner  product  structure  provides  a  mechanism  for  identifying  a  vector  space  and  its 
dual.  However,  it  should  be  borne  in  mind  that  this  identification  will  depend  upon  the 
choice  of  inner  product. 


Theorem  7.10.  Let  V  be  a  finite-dimensional  real  inner  product  space.  Then  every  linear 
function  £ :  V  R  is  given  by  taking  the  inner  product  with  a  fixed  vector  a  E  V: 


(7.12) 


Proof :  Let  v1,...,vn  bea  basis  of  V.  If  we  write  v  =  t/1v1  +  •  •  •  +  £/nvn,  then,  by  linearity, 


£[v]=y1£{v1}+  ■■■  +ynt[vn]=b1y1+  ■■■  +bnyn,  where  \  =  £[  uj.  (7.13) 
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On  the  other  hand,  if  we  write  a  =  x1  v:  +  •  •  •  +  xnvn,  then 

n  n 

(a,v)=  Yj  xjVi(vi’vj)=  E  (7.14) 

ij  =  1  i,j  =  1 

where  G  —  (gtJ)  is  the  nxn  Gram  matrix  with  entries  gtJ  =  ( ,  v  • ).  Equality  of  (7.13, 14) 

T  T 

requires  that  Gx  =  b,  where  x  ^  t)0  ^  ^  tZ/  2  ^  j  ^  ^  ^  I3  ( bx ,  62, . . . ,  bn  )  .  Invertibility  of 

G  as  guaranteed  by  Theorem  3.34,  allows  us  to  solve  for  x  =  G_1b  and  thereby  construct 
the  desired  vector  a.  In  particular,  if  v1? . . . ,  vn  is  an  orthonormal  basis,  then  G  —  I  and 
hence  a  =  b1  +  •  •  •  +  bn  vn.  Q.E.D. 

Remark.  For  the  particular  case  in  which  V  =  Mn  is  endowed  with  the  standard  dot 
product,  Theorem  7.10  identifies  a  row  vector  representing  a  linear  function  with  the  cor¬ 
responding  column  vector  obtained  by  transposition  a  aT.  Thus,  the  naive  identification 
of  a  row  and  a  column  vector  is,  in  fact,  an  indication  of  a  much  more  subtle  phenomenon 
that  relies  on  the  identification  of  Mn  with  its  dual  based  on  the  Euclidean  inner  product. 
Alternative  inner  products  will  lead  to  alternative,  more  complicated,  identifications  of  row 
and  column  vectors;  see  Exercise  7.1.31  for  details. 

Important.  Theorem  7.10  is  not  true  if  V  is  infinite-dimensional.  This  fact  will  have 
important  repercussions  for  the  analysis  of  the  differential  equations  of  continuum  mechan¬ 
ics,  which  will  lead  us  immediately  into  the  much  deeper  waters  of  generalized  function 
theory,  as  described  in  [61]. 


Exercises 


Q 

7.1.27.  Write  down  a  basis  for  and  dimension  of  the  linear  function  spaces  (a)  £(R  ,  R), 
(b)  £(R2,R2),  (c)  C(  Rm,Rn),  (d)  £(P(3),  R),  (e)  C(V{2\  R2),  (f)  C(V{2\  V{2) ). 
Here  V ^  is  the  space  of  polynomials  of  degree  <  n. 

7.1.28.  True  or  false:  The  set  of  linear  transformations  L:R2 
a  subspace  of  £(R2,R2).  If  true,  what  is  its  dimension? 


7.1.29.  True  or  false:  The  set  of  linear  transformations  L: 


2  such  that  L  f  q  |  = 


0 

0 


is 


is  a  subspace  of  £( 


).  If  true,  what  is  its  dimension? 


(  0\ 

/o\ 

such  that  L 

1 

— 

1 

{QJ 

w 

o 

7.1.30.  Consider  the  linear  function  L:R  — >  R  defined  by  L(x,y,z)  =  3x  —  y  +  2z.  Write  down 
the  vector  a  £  R^  such  that  L[v]  =  (a,  v)  when  the  inner  product  is  (a)  the  Euclidean 
dot  product;  (b)  the  weighted  inner  product  (v,w)  =  v1  w1  +  2v2  w2  +  3f3  ie3;  (c)  the 


inner  product  defined  by  the  positive  definite  matrix  K 


(  2 

-1 

V  0 


-1  0\ 
2  1 

1  2  / 


m 

0  7.1.31.  Let  Rn  be  equipped  with  the  inner  product  (v,  w)  =  w  K  w.  Let  L[v]  =  rv  where 
r  is  a  row  vector  of  size  1  xn.  (a)  Find  a  formula  for  the  column  vector  a  such  that  (7.12) 
holds  for  the  linear  function  L:Rn  — x  R.  (b)  Illustrate  your  result  when  r  =  (2,  —  1 ), 
using  (i)  the  dot  product  (ii)  the  weighted  inner  product  (v,w)  =  3v1w1  +  2a2ie2, 

(in)  the  inner  product  induced  by  K  = 
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*s 2  7.1.32.  Dual  Bases :  Given  a  basis  v 


of  U,  the  dual  basis  . . . ,  £n  of  U*  consists  of 


1 5  ■  ■  ■  5 vn 


the  linear  functions  uniquely  defined  by  the  requirements  ^(v  ■)  = 


+  ^nVn 


1  i  =  3, 

0,  i  7^  J- 

(a)  Show  that  ^[v]  =  xi  gives  the  coordinate  of  a  vector  v  =  x1v1  + 
with  respect  to  the  given  basis,  (b)  Prove  that  the  dual  basis  is  indeed  a  basis  for  the  dual 
vector  space,  (c)  Prove  that  if  V  =  Mn  and  A  =  ( v1  v2  . . .  vn  )  is  the  n  x  n  matrix  whose 
columns  are  the  basis  vectors,  then  the  rows  of  the  inverse  matrix  A-1  can  be  identified  as 

the  corresponding  dual  basis  of  (IRn)*. 

7.1.33.  Use  Exercise  7.1.32(c)  to  find  the  dual  basis  for:  (a)  v1  =  (^\^j  ? 


(b)  vx  = 


1 

2 


v2  = 


(_i);  (°)  vi  = 


v2  = 


fl\ 

o 

VP 


v3  = 


(d)  Vj  = 


1\ 

2 

V-37 


v2  = 


°\ 
-3 
1 ) 


-v3  = 


/  — 1  \ 
2 

2/ 


;  (e)  vx  = 


i\ 

i 

0 

>v2  = 

/0\ 

1 

1 

»v3  = 

^  o  o 

»v4  = 

i\ 

-l 

l 

V<V 

V<V 

2  / 

7.1.34.  Let  V ^  denote  the  space  of  quadratic  polynomials  equipped  with  the  L2  inner 

product  (p,q)  =  J  p{x)q(x)dx.  Find  the  polynomial  q  that  represents  the  following 
linear  functions,  i.e.,  such  that  L[p]  =  (q,p):  (a)  L[p]  =  p{ 0),  (b)  L[p]  =  \p{  1), 

(c)  L[p]  =  p{pc)  dx:  (d)  L[p]  =  J  p(pc)dx. 

7.1.35.  Find  the  dual  basis,  as  defined  in  Exercise  7.1.32,  for  the  monomial  basis  of  with 

2  f 1 

respect  to  the  L  inner  product  (p  ,q)  =  p(x)  g(x)  dx. 

7.1.36.  Write  out  a  proof  of  Theorem  7.10  that  does  not  rely  on  finding  an  orthonormal  basis. 


Composition 

Besides  adding  and  multiplying  by  scalars,  one  can  also  compose  linear  functions. 


Lemma  7.11.  Let  V,W,Z  be  vector  spaces.  If  L\V  — X  W  and  M:W  — X  Z  are  linear 
functions,  then  the  composite  function  M°L\V  — X  Z,  defined  by  (M  °L)[v]  =  M[L[v 
is  also  linear. 


Proof :  This  is  straightforward: 

(M  oL)[cv  +  dw]  =  M[L[cv  +  dwj]  =  M[cL[v]  +  dL[w] 

=  cM[L[v]]  d M[L[w]]  =  c  (M  °L)[v]  +  d  (M °L)[w], 

where  we  used,  successively,  the  linearity  of  L  and  then  of  M.  Q.E.D. 

For  example,  if  L[v]  —  4v  maps  Mn  to  and  M[w]  —  B  w  maps  to  M*,  so  that 
4  is  an  m  x  n  matrix  and  B  is  a  l  x  m  matrix,  then 


(M  o  L)  [v 


M[L[v 


B(Av)  =  (54)v, 


and  hence  the  composition  M  o  L  \  Mn  — x  WLl  corresponds  to  the  l  x  n  product  matrix  BA. 
In  other  words,  on  Euclidean  space,  composition  of  linear  functions  is  the  same  as  matrix 
multiplication.  And,  like  matrix  multiplication,  composition  of  (linear)  functions  is  not,  in 
general,  commutative. 


7.1  Linear  Functions 
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Example  7.12.  Composing  two  rotations  results  in  another  rotation:  R  °  Re  —  i?  +0. 

In  other  words,  if  we  hrst  rotate  by  angle  9  and  then  by  angle  </?,  the  net  effect  is  rotation 
by  angle  p>  +  9.  On  the  matrix  level  of  (7.8),  this  implies  that 

/  cos  (p  —  sin^A  /  cos  9  —sin  9\  _  .  .  _  .  _  ( cos(</?  +  9)  —sin  (ip-\-9)\ 

ysiny?  cos  ip  J  ysin#  cos  9j  ^  e  y  sin(<£>  +  9)  cos(<^  +  9)  J  ’ 

Multiplying  out  the  left-hand  side,  we  deduce  the  well-known  trigonometric  addition 
formulas 

cos(</?  +  9)  —  cos  p>  cos  9  —  sin  cp  sin  0,  sin (cp  +  9)  —  cos  (p  sin  9  +  sin  p>  cos  9. 

In  fact,  this  constitutes  a  bona  fide  proof  of  these  two  trigonometric  identities! 


Example  7.13.  One  can  build  up  more  sophisticated  linear  operators  on  function  space 


by  adding  and  composing  simpler  ones.  In  particular,  higher  order  derivative  operators 
are  obtained  by  composing  the  derivative  operator  H,  defined  in  (7.11),  with  itself.  For 
example, 


D2[f]  =  D°D[f]  =  D[f]  =  f" 


defines  the  second  derivative  operator.  One  needs  to  exercise  due  care  about  the  domain 
of  definition,  since  not  every  function  is  differentiable.  In  general,  the  kth  order  derivative 


Dk[f]  =  pk\x) 


defines  a  linear  operator 


Dk\  C n[a,b] 


>  C 


n  —  k 


for  all  n  >  k, 


obtained  by  composing  D  with  itself  k  times. 

If  we  further  compose  Dk  with  the  linear  operation  of  multiplication  by  a  given  function 
a{x)  we  obtain  the  linear  operator  ( aDk)[f }  =  a(x)  f  ^  (x).  Finally,  a  general  linear 
ordinary  differential  operator  of  order  n, 


L  =  an(x)  Dn  +  an_1(x)  Dn  1  +  •••  +  ax  (x)  D  +  aQ(x),  (7-15) 

is  obtained  by  summing  such  operators.  If  the  coefficient  functions  a0(x), . . . ,  an(x)  are 
continuous,  then 


L 


u 


dnu 

—  an(x)  +  an-l(x) 


d 


n  —  1 


U 


dxn  1 


/  \  du  ,  . 

aAx)  — —  H-  ar\[x)n 
dx 


(7.16) 


defines  a  linear  operator  from  Cn[a,  b]  to  C°[a,  b].  The  most  important  case  —  but  certainly 
not  the  only  one  arising  in  applications  —  is  when  the  coefficients  a^x)  =  ci  are  all  constant. 


Exercises 

7.1.37.  For  each  of  the  following  pairs  of  linear  functions  S,T:  R2  —>  R  ,  describe  the 
compositions  S  °T  and  T  o  S.  Do  the  functions  commute? 

(a)  S  =  counterclockwise  rotation  by  60°;  T  =  clockwise  rotation  by  120°; 

(b)  S  =  reflection  in  the  line  y  =  x;  T  =  rotation  by  180°; 

(c)  S  =  reflection  in  the  x-axis;  T  =  reflection  in  the  y-axis; 

(d)  S  =  reflection  in  the  line  y  =  x;  T  =  reflection  in  the  line  y  =  2x; 

(e)  S  =  orthogonal  projection  on  the  x-axis;  T  =  orthogonal  projection  on  the  y-axis; 

(f)  S  =  orthogonal  projection  on  the  x-axis;  T  =  orthogonal  projection  on  the  line  y  =  x; 

(g)  S  =  orthogonal  projection  on  the  x-axis;  T  =  rotation  by  180°; 

(h)  S  =  orthogonal  projection  on  the  x-axis;  T  =  counterclockwise  rotation  by  90°; 

(i)  S  =  orthogonal  projection  on  the  line  y  =  —  2x;  T  =  reflection  in  the  line  y  =  x. 
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7.1.38.  Find  a  matrix  representative  for  the  linear  functions  (a)  L: 


to 


1 

3 


and  e2  to 


-1 

2 


;  (b)  M: 


and  (c)  TV: 


that  takes 


1 

-3 


to 


.2  that  takes  e-^  to  ^ 


-1 

-3 


that  maps  e1 
and  e2  to  '  ^ 


-1 

-3 


and 


1 

2 


to  (  2  )  •  (d)  Explain  why 


M  =  N  o  L.  (e)  Verify  part  (d)  by  multiplying  the  matrix  representatives. 


o 

7.1.39.  On  the  vector  space  R  ,  let  R  denote  counterclockwise  rotation  around  the  x  axis 
by  90°  and  S  counterclockwise  rotation  around  the  z-axis  by  90°.  (a)  Find  matrix 

representatives  for  R  and  S.  (b)  Show  that  R°  S  ^  S  °  R.  Explain  what  happens 

to  the  standard  basis  vectors  under  the  two  compositions,  (c)  Give  an  experimental 
demonstration  of  the  noncommutativity  of  R  and  S  by  physically  rotating  a  solid  object, 
e.g.,  this  book,  in  the  prescribed  manners. 

Q 

7.1.40.  Let  P  denote  orthogonal  projection  of  IR  onto  the  plane  V  =  {z  =  x  +  y}  and 
Q  denote  orthogonal  projection  onto  the  plane  W  =  {z  =  x  —  y}.  Is  the  composition 
R  =  Q  °  P  the  same  as  orthogonal  projection  onto  the  line  L  =  V  n  W1  Verify  your 
conclusion  by  computing  the  matrix  representatives  of  P,Q,  and  R. 

7.1.41.  (a)  Write  the  linear  operator  L[f(x)]  =  f(b)  as  a  composition  of  two  linear  functions. 
Do  your  linear  functions  commute?  (b)  For  which  values  of  a,  6,  c,  d,  e  is 

L[f(x)]  =  a  f'(b)  +  c  f(d)  +  e  a  linear  function? 

7.1.42.  Let  L  =  xD  +  1,  and  M  =  D  —  x  be  differential  operators.  Find  L°M  and  M°L.  Do 
the  differential  operators  commute? 


7.1.43.  Show  that  the  space  of  constant  coefficient  linear  differential  operators  of  order  <  n  is  a 
vector  space.  Determine  its  dimension  by  exhibiting  a  basis. 


7.1.44.  (a)  Explain  why  the  differential  operator  L  =  D  °  Ma  o  D  obtained  by  composing  the 
linear  operators  of  differentiation  D[f(pc)]  =  f'(x )  and  multiplication  Ma[f(pc)]  =  a(pc)  f(x) 
by  a  given  function  a(pc)  defines  a  linear  operator,  (b)  Re-express  L  as  a  linear  differential 
operator  of  the  form  (7.16). 

0  7.1.45.  (a)  Show  that  composition  of  linear  functions  is  associative:  (L  °M)  °  N  =  L  °  (M  °  N). 
Be  precise  about  the  domain  and  codomain  spaces  involved,  (b)  How  do  you  know 

the  result  is  a  linear  function?  (c)  Explain  why  this  proves  associativity  of  matrix 
multiplication. 


7.1.46.  Show  that  if  p(x,  y)  is  any  polynomial,  then  L  =  p(dx,dv)  defines  a  linear,  constant 

-2.-2  r  Q 2 

X 


x 7  y 

coefficient  partial  differential  operator.  For  example,  if  p(x,  y)  =  xA  +  yz ,  then  L  =  dx  +  d, 

2 


d2f  dzf 

is  the  Laplacian  operator  A [/]  =  7^7  + 


dx 2  dy 2 ’ 

G  7.1.47.  The  commutator  of  two  linear  transformations  L,  M:  V  — >  V  on  a  vector  space  V  is 

K  =  {L,M}  =  LoM  -  MoL.  (7.17) 

(a)  Prove  that  the  commutator  K  is  a  linear  transformation  on  V.  (b)  Explain  why 
Exercise  1.2.30  is  a  special  case,  (c)  Prove  that  L  and  M  commute  if  and  only  if 

[  L,  M  ]  =  O.  (d)  Compute  the  commutators  of  the  linear  transformations  defined  by  the 
following  pairs  of  matrices: 


(0 


1 

1 


0 

1 

+ 


1 
0 
1 

=  o 


/ 


V 


1 

0 

0 


0 

1 

0 


0\ 

0 

-17 


(7.18) 


(e)  Prove  that  the  Jacobi  identity 

~[L,M],N]  +  [[  V,L],M 
is  valid  for  any  three  linear  transformations,  (f)  Verify  the  Jacobi  identity  for  the  first  three 
matrices  in  part  (c).  (g)  Prove  that  the  commutator  F>[L,  M]  =  [  L,  M }  defines  a  bilinear 
map  B:  C(V,V)  x  C(VtV)  — >  C(V,V)  on  the  Cartesian  product  space,  cf.  Exercise  7.1.18. 
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0  7.1.48.  (a)  In  (one-dimensional)  quantum  mechanics,  the  differentiation  operator 
P[f(x)]  =  f\x )  represents  the  momentum  of  a  particle,  while  the  operator 
Q[f(x)]  =  xf(x )  °f  multiplication  by  the  function  x  represents  its  position.  Prove  that  the 
position  and  momentum  operators  satisfy  the  Heisenberg  Commutation  Relations 
[  P,  Q  ]  =  P  °  Q  —  Q  °  P  =  I .  (b)  Prove  that  there  are  no  matrices  P,  Q  that  satisfy  the 

Heisenberg  Commutation  Relations.  Hint :  Use  Exercise  1.2.31. 

Remark.  The  noncommutativity  of  quantum  mechanical  observables  lies  at  the  heart  of 
the  Uncertainty  Principle.  The  result  in  part  (b)  is  one  of  the  main  reasons  why  quantum 
mechanics  must  be  an  intrinsically  infinite-dimensional  theory. 


<v>  7.1.49.  Let  V denote  the  set  of  all  first  order  linear  differential  operators  L  =  p{x)  D  +  q(x) 
where  p,q  are  polynomials,  (a)  Prove  that  is  a  vector  space.  Is  it  finite-dimensional 
or  infinite-dimensional?  (b)  Prove  that  the  commutator  (7.17)  of  L1M  £  V (1)  is  a  first 
order  differential  operator  [  L,  M  ]  £  T> ^  by  writing  out  an  explicit  formula,  (c)  Verify  the 
Jacobi  identity  (7.18)  for  the  first  order  operators  L  =  D,  M  =  xD  - 1-1,  and  N  =  x  D  +  2x. 


7.1.50.  Do  the  conclusions  of  Exercise  7.1.49(a— b)  hold  for  the  space  of  second  order 

r\ 

differential  operators  L  =  p{x)  D  +  q(x)  D  +  r(x),  where  p,  q ,  r  are  polynomials? 


Inverses 

The  inverse  of  a  linear  function  is  defined  in  direct  analogy  with  the  Definition  1.13  of  the 
inverse  of  a  (square)  matrix. 

Definition  7.14.  Let  L\V  -H  W  be  a  linear  function.  If  M:W  -T  V  is  a  function  such 
that  both  compositions 


LoM  =  I 


W  5 


M  O  L  =  I  y  , 


(7.19) 


are  equal  to  the  identity  function,  then  we  call  M  the  inverse  of  L  and  write  M  =  L  1 . 


The  two  conditions  (7.19)  require 
L[M[w]]  —  w  for  all  w  £  W, 


and 


M[L[v]  ]  —  v 


for  all 


v  £  V. 


In  Exercise  7.1.55,  you  are  asked  to  prove  that,  when  it  exists,  the  inverse  is  unique.  Of 
course,  if  M  —  L_1  is  the  inverse  of  L,  then  L  —  M_1  is  the  inverse  of  M  since  the 
conditions  are  symmetric,  and,  in  such  cases,  (L-1)-1  =  L. 

Lemma  7.15.  If  it  exists,  the  inverse  of  a  linear  function  is  also  a  linear  function. 

Proof :  Let  L,  M  satisfy  the  conditions  of  Definition  7.14.  Given  w,  w  £  W,  we  note 
w  —  (L  o M) [ w]  —  L[v],  w  =  (L  o  M ) [ w]  =  L[v],  where  v  =  M[ w],  v  =  M[ w  . 
Therefore,  given  scalars  c,  d:  and  using  only  the  linearity  of  L, 

M[cw  +  dw]  =  M[cL[v]  +  dL[v]  ]  =  (M  o L)[cv  +  dv]  =  cv  +  Jv  =  cM[ w]  -\-  dM[ w], 


proving  linearity  of  M . 


Q.E.D. 


If  V  =  Mn,  W  =  Mm,  so  that  L  and  M  are  given  by  matrix  multiplication,  by  A  and 
B  respectively,  then  the  conditions  (7.19)  reduce  to  the  usual  conditions 


AB  =  I 


BA  =  I, 
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for  matrix  inversion,  cf.  (1.37).  Therefore,  B  —  A-1  is  the  inverse  matrix.  In  particular, 
for  L :  — >>  Mn  to  have  an  inverse,  we  must  have  m  —  n,  and  its  coefficient  matrix  A 

must  be  nonsingular. 

The  invertibility  of  linear  transformations  on  infinite-dimensional  function  spaces  is 
more  subtle.  Here  is  a  familiar  example  from  calculus. 


Example  7.16.  The  Fundamental  Theorem  of  Calculus  says,  roughly,  that  differen- 

tiation  D[f]  =  f  and  (indefinite)  integration  J[f]  =  g:  where  g(x)  =  /  f(y)dy ,  are 

J  a 

“inverse”  operations.  More  precisely,  the  derivative  of  the  indefinite  integral  of  /  is  equal 
to  /,  and  hence 


D[J[f}}  =  D[g]  =  g'  =  f,  since  g'(x)  = 


d 

dx 


* X 


f(y)  dy  =  f(x). 


a 


In  other  words,  the  composition  DoJ  =  I c° [a,  6]  defines  the  identity  operator  on  the 
function  space  C°[a,  b].  On  the  other  hand,  if  we  integrate  the  derivative  of  a  continuously 
differentiable  function  /  E  C  1[a,  6],  we  obtain  J[D[f}}  —  J[f'}  =  h,  where 


h(x)  =  j  f(y)  dy  =  f(x)  -  f(a)  ±  f(x)  unless  f(a)  =  0. 
J  a 


Therefore,  the  composition  is  not  the  identity  operator:  JoD  ^  Ic i[a,b]m  °ther  words, 
the  differentiation  operator  D  is  a  left  inverse  for  the  integration  operator  J  but  not  a  right 
inverse! 


If  we  restrict  D  to  the  subspace  V  =  {  /  |  /(a)  ^OjcCfa,  b }  consisting  of  all  continu¬ 
ously  differentiable  functions  that  vanish  at  the  left-hand  endpoint,  then  J:  C°[a,  b]  — V, 
and  D:  V  C°[a,  b]  are,  by  the  preceding  argument,  inverse  linear  operators:  DoJ  — 


I c° [ a  6] ’  and  J°D  —  I  v .  Note  that  V  C  C1[a,  b]  C  C°[a,  b).  Thus,  we  discover  the 
curious  and  disconcerting  infinite-dimensional  phenomenon  that  J  defines  a  one-to-one, 
invertible,  linear  map  from  a  vector  space  C°[a,  b]  to  a  proper  subspace  V  C  C°[a,  b].  This 
paradoxical  situation  cannot  occur  in  finite  dimensions.  A  linear  map  L :  Mn  -4-  Mn  can  be 
invertible  only  when  its  image  is  the  entire  space  —  because  it  represents  multiplication 
by  a  nonsingular  square  matrix. 


Two  vector  spaces  V,  W  are  said  to  be  isomorphic ,  written  V  ~  W,  if  there  exists  an 
invertible  linear  function  L:  V  W.  For  example,  if  V  is  finite-dimensional,  then  V  —  W 
if  and  only  if  W  has  the  same  dimension  as  V.  In  particular,  if  V  has  dimension  n,  then 
V  —  One  way  to  construct  the  required  invertible  linear  map  is  the  choose  a  basis 


15 


vn  of  V,  and  map  it  to  the  standard  basis  of  Mn,  so  L[vfc]  =  ek  for  k  —  1, 


n. 


In  general,  given  v  =  xpv1  +  •  •  •  +  xnvn,  then,  by  linearity, 


L[v]  =  L[x1v1  +  •  •  •  +xnwn 


n 


=  x1L[v1\+  ■■■  +xnL[v, 

=  x1el+  •••  +xnen  =  (x1,x2,...,xn)T  =  x. 


and  hence  L  maps  v  to  the  column  vector  x  =  Mn  whose  entries  are  its  coordinates  with 
respect  to  the  chosen  basis.  The  inverse  L-1:Mn  — »>  V  maps  x  E  Mn  to  the  element 
L~l  [x]  =  x1v1  +  •  •  •  +  xnYn  E  V.  As  the  above  example  makes  clear,  isomorphism  of 
infinite-dimensional  vector  spaces  is  more  subtle,  and  one  often  imposes  additional  restric¬ 
tions  on  the  allowable  linear  maps. 
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Exercises 

7.1.51.  Determine  which  of  the  following  linear  functions  L:M2  — >  R2  has  an  inverse,  and, 
if  so,  describe  it:  (a)  the  scaling  transformation  that  doubles  the  length  of  each  vector; 

(b)  clockwise  rotation  by  45°;  (c)  reflection  through  the  y- axis;  (d)  orthogonal  projection 

/ 1  2 

onto  the  line  y  =  x;  (e)  the  shearing  transformation  defined  by  the  matrix  (  ^ 

7.1.52.  For  each  of  the  linear  functions  in  Exercise  7.1.51,  write  down  its  matrix  representative, 
the  matrix  representative  of  its  inverse,  and  verify  that  the  matrices  are  mutual  inverses. 

2  '  ke  the  linear  function  such  that  L[e1  ]  =  ( 1,  —  1  )T ,  L[e2  ]  =  (3,  —2  )T . 

-l 


7.1.53.  Let  L : 

-l 


Find  L  [e-JandL  [e2]. 

3  v  ttd 3  i-  _  1 1.  .  i*  r  1  ,1  ,  ,  r  1  /  o  -i  1  iT 

T  T  r_  i  /  O  o\T  tti * _ r-lr  l  r~  1 


7.1.54.  Let  L :  R°  — )►  R°  be  the  linear  function  such  that  L[e-j_  ]  =  (2, 1,-1) 


L[e2]  =  (i,2,1)  ,  L[e3]  =  (-1,2,2)  .  Find  L  [e-J,  L  [e2],  and  L  [e3]. 

0  7.1.55.  Prove  that  the  inverse  of  a  linear  transformation  is  unique;  i.e.,  given  L,  there  is  at 
most  one  linear  transformation  M  that  can  satisfy  (7.19). 


0  7.1.56.  Let  L:  V  — >  W  be  a  linear  function.  Suppose  M,  N:W  V  are  linear  functions  that 
satisfy  L°M  =  I  v  =  N  °L.  Prove  that  M  =  N  =  L-1.  Thus,  a  linear  function  may  have 
only  a  left  or  a  right  inverse,  but  if  it  has  both,  then  they  must  be  the  same. 

7.1.57.  Give  an  example  of  a  matrix  with  a  left  inverse,  but  not  a  right  inverse.  Is  your  left 
inverse  unique? 

T  7.1.58.  Suppose  v1? . . . ,  vn  is  a  basis  for  V  and  w1? . . . ,  wn  a  basis  for  W.  (a)  Prove  that  there 
is  a  unique  linear  function  L:  V  — >  W  such  that  L[vJ  =  wi  for  i  =  1, . . .  ,n.  (b)  Prove 
that  L  is  invertible,  (c)  If  V  =  W  =  IRn,  find  a  formula  for  the  matrix  representative  of  the 
linear  functions  L  and  L~l .  (d)  Apply  your  construction  to  produce  a  linear  function  that 


(  0\ 

/i\ 

/°\ 

/°\ 

(Hi)  v-l  = 

1 

>  v2  = 

0  ,  v3  = 

1  to  w1  = 

0 

,  W2  = 

1  >  w3  = 

0 

Ki) 

w 

{OJ 

\0/ 

\0J 

U/ 

7.1.59.  Suppose  V,  W  C  Mn  are  subspaces  of  the  same  dimension.  Prove  that  there  is  an 
invertible  linear  function  L:IRn  — >  IRn  that  takes  V  to  W.  Hint :  Use  Exercise  7.1.58. 


0  7.1.60.  Let  W,  Z  be  complementary  subspaces  of  a  vector  space  V,  as  in  Exercise  2.2.24.  Let 
V/W  denote  the  quotient  vector  space,  as  defined  in  Exercise  2.2.29.  Show  that  the  map 
L:  Z  — >  V/W  that  maps  L[ z]  =  [z]w  defines  an  invertible  linear  map,  and  hence  Z  ^  V/W 
are  isomorphic  vector  spaces. 

0  7.1.61.  Let  L  :V  — >  W  be  a  linear  map.  (a)  Suppose  V,  W  are  finite-dimensional  vector  spaces, 
and  let  A  be  a  matrix  representative  of  L.  Explain  why  we  can  identify  coker  A  ~  VF/img  A 
and  coimgA  =  U/ker  A  as  quotient  vector  spaces,  cf.  Exercise  2.2.29. 

Remark.  These  characterizations  are  used  to  give  intrinsic  definitions  of  the  cokernel  and 
coimage  of  a  general  linear  function  L:V  W  without  any  reference  to  a  transpose  (or,  as 
defined  below,  adjoint)  operation.  Namely,  set  coker L  ~  VF/imgL  and  coimgL  =  U/ker L. 

(b)  The  index  of  the  linear  map  is  defined  as  index  L  =  dimkerL  —  dim  coker  L, 
using  the  above  intrinsic  definitions.  Prove  that,  when  U,  W  are  finite-dimensional, 
index  L  =  dim  V  —  dim  W. 
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Figure  7.4.  Rotation. 


0  7.1.62.  Let  V  be  a  finite-dimensional  real  inner  product  space  and  let  U*  be  its  dual.  Using 
Theorem  7.10,  prove  that  the  map  J:  U*  — >  V  that  takes  the  linear  function  £  £  U*  to  the 
vector  J[£]  =  a  £  V  satisfying  £[v]  =  (a,v)  defines  a  linear  isomorphism  between  the 
inner  product  space  and  its  dual:  U*  ~  V. 

7.1.63.  (a)  Prove  that  L[p]  =  p'  +  p  defines  an  invertible  linear  map  on  the  space  V ^  of 
quadratic  polynomials.  Find  a  formula  for  its  inverse. 

(b)  Does  the  derivative  D[p]  =  p  have  either  a  left  or  a  right  inverse  on 

T  7.1.64.  (a)  Show  that  the  set  of  all  functions  of  the  form  /(x)  =  (ax2  +  bx  +  c)  ex  for 
a,  6,  c,  £  R  is  a  vector  space.  What  is  its  dimension?  (b)  Show  that  the  derivative 

D[f(x)]  =  f  (x)  defines  an  invertible  linear  transformation  on  this  vector  space,  and 
determine  its  inverse,  (c)  Generalize  your  result  in  part  (b)  to  the  infinite-dimensional 
vector  space  consisting  of  all  functions  of  the  form  p(x)ex,  where  p{pc)  is  an  arbitrary 
polynomial. 


7.2  Linear  Transformations 


Consider  a  linear  function  L :  Mn  Mn  that  maps  n-dimensional  Euclidean  space  to  itself. 
The  function  L  maps  a  point  x  £  Mn  to  its  image  point  L[x]  =  Ax,  where  A  is  its 
n  x  n  matrix  representative.  As  such,  it  can  be  assigned  a  geometrical  interpretation  that 
leads  to  further  insight  into  the  nature  and  scope  of  linear  functions  on  Euclidean  space. 
The  geometrically  inspired  term  linear  transformation  is  often  used  to  refer  to  such  linear 
functions.  The  two-,  three-,  and  four-dimensional  (viewing  time  as  the  fourth  dimension  of 
space-time)  cases  have  particular  relevance  to  our  physical  universe.  Many  of  the  notable 
maps  that  appear  in  geometry,  computer  graphics,  elasticity,  symmetry,  crystallography, 
and  Einstein’s  special  relativity,  to  name  a  few,  are  defined  by  linear  transformations. 

Most  of  the  important  classes  of  linear  transformations  already  appear  in  the  two-dim¬ 
ensional  case.  Every  linear  function  L :  M2  — >*  M2  has  the  form 

T)=(s+U  wto'e  UU  (7-2o) 

is  an  arbitrary  2x2  matrix.  We  have  already  encountered  the  rotation  matrices 


f  cos  9  —  sin  9  \ 

y  sin  9  cos  9  J  ’ 


(7.21) 


whose  effect  is  to  rotate  every  vector  in  M2  through  an  angle  9 ;  in  Figure  7.4  we  illustrate 
the  effect  on  a  couple  of  square  regions  in  the  plane.  Planar  rotations  coincide  with  2x2 
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Figure  7.5.  Reflection  through  the  y- axis. 


Figure  7.6.  Reflection  through  the  Diagonal. 


proper  orthogonal  matrices,  meaning  matrices  Q  that  satisfy 

QtQ  =  I,  det  Q  =  +1. 


(7.22) 


The  improper  orthogonal  matrices,  i.e.,  those  with  determinant  —1,  define  reflections.  For 
example,  the  matrix 


corresponds  to  the  linear  transformation  L 


(7.23) 


which  reflects  the  plane  through  the  y- axis.  It  can  be  visualized  by  thinking  of  the  y- axis  as 
a  mirror,  as  illustrated  in  Figure  7.5.  Another  simple  example  is  the  improper  orthogonal 
matrix 


The  corresponding  linear  transformation 


(x)  <7-24) 


is  a  reflection  through  the  diagonal  line  y  —  x,  as  illustrated  in  Figure  7.6. 

A  similar  classification  of  orthogonal  matrices  carries  over  to  three-dimensional  (and 
even  higher-dimensional)  space.  The  proper  orthogonal  matrices  correspond  to  rotations 
and  the  improper  orthogonal  matrices  to  reflections,  or,  more  generally,  reflections  com¬ 
bined  with  rotations.  For  example,  the  proper  orthogonal  matrix 


(cos  9 
sin  9 
0 


—  sin  9 
cos  9 
0 


(7.25) 


corresponds  to  a  counterclockwise  rotation  through  an  angle  6  around  the  z-axis,  while 

cos  ip  0 

0  1 
0 


L  = 


sin  (f 


(7.26) 


COS  Lp 
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Figure  7.7.  A  Three-Dimensional  Rotation. 


Figure  7.8.  Stretch  along  the  x-axis. 


corresponds  to  a  clockwise  rotation  through  an  angle  ip  around  the  y- axis.  In  general,  a 
proper  orthogonal  matrix  Q  =  ( u:  u2  u3 )  with  columns  u^  =  Qei  corresponds  to  the 
rotation  in  which  the  standard  basis  vectors  e1,e2,e3  are  rotated  to  new  positions  given 
by  the  orthonormal  basis  u1?  u2,  u3.  It  can  be  shown  —  see  Exercise  8.2.44  —  that  every 
3x3  orthogonal  matrix  corresponds  to  a  rotation  around  a  line  through  the  origin  in  M3 
—  the  axis  of  the  rotation,  as  sketched  in  Figure  7.7. 

Since  the  product  of  two  (proper)  orthogonal  matrices  is  also  (proper)  orthogonal,  the 
composition  of  two  rotations  is  also  a  rotation.  Unlike  the  planar  case,  the  order  in  which 
the  rotations  are  performed  is  important!  Multiplication  ofnxn  orthogonal  matrices  is  not 
commutative  when  n  >  3.  For  example,  rotating  first  around  the  z- axis  and  then  rotating 
around  the  y- axis  does  not  have  the  same  effect  as  first  rotating  around  the  y- axis  and 
then  around  the  z- axis.  If  you  don’t  believe  this,  try  it  out  with  a  solid  object  such  as  this 
book.  Rotate  through  90°,  say,  around  each  axis;  the  final  configuration  of  the  book  will 
depend  upon  the  order  in  which  you  do  the  rotations.  Then  prove  this  mathematically  by 
showing  that  the  two  rotation  matrices  (7.25,  26)  do  not  commute. 

Other  important  linear  transformations  arise  from  elementary  matrices.  First,  the  ele¬ 
mentary  matrices  corresponding  to  the  third  type  of  row  operations  —  multiplying  a  row 
by  a  scalar  —  correspond  to  simple  stretching  transformations.  For  example,  if 


then  the  linear  transformation 


has  the  effect  of  stretching  along  the  x-axis  by  a  factor  of  2;  see  Figure  7.8.  A  negative  di¬ 
agonal  entry  corresponds  to  a  reflection  followed  by  a  stretch.  For  example,  the  elementary 
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Figure  7.9. 


Shear  in  the  x  Direction. 


matrix 


0 

1 


corresponds  to  a  reflection  through  the  y- axis  followed  by  a  stretch  along  the  x-axis.  In 
this  case,  the  order  of  these  operations  is  immaterial,  since  the  matrices  commute. 

In  the  2  x  2  case,  there  is  only  one  type  of  elementary  row  interchange,  namely  the  matrix 
0  1  \ 

1  q  J ,  which  corresponds  to  a  reflection  through  the  diagonal  y  =  x,  as  in  (7.24). 

The  elementary  matrices  of  type  #1  correspond  to  shearing  transformations  of  the 
plane.  For  example,  the  matrix 


represents  the  linear  transformation  L 

which  has  the  effect  of  shearing  the  plane  along  the  x-axis.  The  constant  2  will  be  called  the 
shear  factor ,  and  can  be  either  positive  or  negative.  Under  the  shearing  transformation, 
each  point  moves  parallel  to  the  x-axis  by  an  amount  proportional  to  its  (signed)  distance 
from  the  axis.  Similarly,  the  elementary  matrix 


represents  the  linear  transformation 


/  x 

\y-  3x 


5 


which  is  a  shear  along  the  y- axis  of  magnitude  —3.  As  illustrated  in  Figure  7.9,  shears  map 
rectangles  to  parallelograms;  distances  are  altered,  but  areas  are  unchanged. 

All  of  the  preceding  linear  maps  are  invertible,  and  so  are  represented  by  nonsingular 
matrices.  Besides  the  zero  map/matrix,  which  sends  every  point  x  G  M2  to  the  origin,  the 
simplest  singular  map  is 


1 

0 


corresponding  to  the  linear  transformation 


5 


T 

which  defines  the  orthogonal  projection  of  the  vector  (x,y)  onto  the  x-axis.  Other  rank 
one  matrices  represent  various  kinds  of  projections  from  the  plane  to  a  line  through  the 
origin;  see  Exercise  7.2.16  for  details. 

A  similar  classification  of  linear  maps  can  be  established  in  higher  dimensions.  The 
linear  transformations  constructed  from  elementary  matrices  can  be  built  up  from  the 
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following  four  basic  types: 

(i)  a  stretch  in  a  single  coordinate  direction; 

(ii)  a  reflection  through  a  coordinate  plane; ^ 

(in)  a  reflection  through  a  diagonal  plane; 

( iv )  a  shear  along  a  coordinate  axis. 

Moreover,  we  already  proved  —  see  (1.47)  —  that  every  nonsingular  matrix  can  be  written 
as  a  product  of  elementary  matrices.  This  has  the  remarkable  consequence  that  every 
invertible  linear  transformation  can  be  constructed  from  a  sequence  of  elementary  stretches, 
reflections,  and  shears.  In  addition,  there  is  one  further,  non-invert ible,  type  of  basic  linear 
transformation: 


(v)  an  orthogonal  projection  onto  a  lower-dimensional  subspace. 

All  linear  transformations  of  Mn  can  be  built  up,  albeit  non-uniquely,  as  a  composition  of 
these  five  basic  types. 

Example  7.17.  Consider  the  matrix  A  — 


Vs 

2 

1 

2 


corresponding  to  a  plane  rota¬ 


tion  through  9  —  30°,  cf.  (7.21).  Rotations  are  not  elementary  linear  transformations.  To 
express  this  particular  rotation  as  a  product  of  elementary  matrices,  we  need  to  perform  the 
Gauss-Jordan  Elimination  procedure  to  reduce  it  to  the  identity  matrix.  Let  us  indicate 
the  basic  steps: 

/  1 

E±A  — 


E2  = 


E3  = 


E,= 


i 

\/3 

1  0 

0  — 
u  2 

*  ° 
0  1 

1  J_ 

1  V3 

0  1 


E2  Ey  A  — 


^3  ^2  ^1  A  ~ 


E4  E3  E2  Ey  A  =  I 


0 

1 


We  conclude  that 


C3 

2 

1 

2 


—  A  —  E\ 


i 


^2_1^3 


As  a  result,  a  30°  rotation  can  be  effected  by  composing  the  following  elementary  transfor¬ 
mations  in  the  prescribed  order,  bearing  in  mind  that  the  last  matrix  in  the  product  will 
act  first  on  the  vector  x: 

(1)  First,  a  shear  in  the  x  direction  with  shear  factor  —  ^=. 

(2)  Then  a  stretch  (or,  rather,  a  contraction)  in  the  direction  of  the  x-axis  by 

a  factor  of 

(3)  Then  a  stretch  in  the  y  direction  by  the  reciprocal  factor  -^=. 

(4)  Finally,  a  shear  in  the  direction  of  the  y- axis  with  shear  factor 


In  n-dimensional  space,  this  should  read  “hyperplane” ,  i.e.,  a  subspace  of  dimension  n  —  1. 
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The  fact  that  this  combination  of  elementary  transformations  results  in  a  pure  rotation  is 
surprising  and  non-obvious. 


Exercises 


9  9 

7.2.1.  For  each  of  the  following  linear  transformations  L:R  — >  R  ,  find  a  matrix 
representative,  and  then  describe  its  effect  on  (i)  the  x-axis;  (it)  the  unit  square 
S  =  {0  <  x,y  <  1};  (in)  the  unit  disk  D  =  {x  Ay  <  1}:  (a)  counterclockwise 
rotation  by  45°;  (b)  rotation  by  180°;  (c)  reflection  in  the  line  y  =  2x;  (d)  shear 
along  the  y- axis  of  magnitude  2;  (e)  shear  along  the  line  x  =  y  of  magnitude  3; 

(f)  orthogonal  projection  on  the  line  y  =  2x. 

0  1 
1  0 

2  o 

L  =  L°L  is  rotation  by  180  .  Is  L  itself  a  rotation  or  a  reflection? 

7.2.3.  Let  L  be  the  linear  transformation  determined  by 
geometrically. 


Show  L2  =  I ,  and  interpret 


^ .  Show  that 


7.2.2.  Let  L  be  the  linear  transformation  represented  by  the  matrix 


7.2.4.  What  is  the  geometric  interpretation  of  the  linear  transformation  with  matrix 


A  = 


1  0 
2  -1 


?  Use  this  to  explain  why  A2  =  I . 


7.2.5.  Describe  the  image  of  the  line  i  that  goes  through  the  points 


-2 


1 

-2 


under  the 


linear  transformation 


2  3 
1  0 


7.2.6.  Draw  the  parallelogram  spanned  by  the  vectors 


and 


3 

1 


.  Then  draw  its  image 


under  the  linear  transformations  defined  by  the  following  matrices: 

/  J_  _J_\ 

\/2  V2 

1  1 


(a) 


1  0 

-1  1 


( b ) 


0  1 
1  0 


(c) 


1  2 
-1  4 


( d ) 


V  U2  V2  / 


(e) 


-1  -2 

2  1 


(0 


1 

2 
1 

2 


(g) 


2  -1 
-4  2 


9  9 

7.2.7.  Find  a  linear  transformation  that  maps  the  unit  circle  x  Ay  =  1  to  the  ellipse 
\x2  A  \y2  =  1.  Is  your  answer  unique? 

9  9  9 

7.2.8.  Find  a  linear  transformation  that  maps  the  unit  sphere  x  Ay  A  z  =lto  the  ellipsoid 
x2  +  \  y2  +  jqZ2  =  l. 

9  9 

7.2.9.  True  or  false:  A  linear  transformation  L:  R  — )►  R  maps 

(a)  straight  lines  to  straight  lines;  (b)  triangles  to  triangles;  (c)  squares  to  squares; 

(d)  circles  to  circles;  (e)  ellipses  to  ellipses. 

9  7.2.10.  (a)  Prove  that  the  linear  transformation  associated  with  the  improper  orthogonal  matrix 

cos^  sin^  j  a  reflection  through  the  line  that  makes  an  angle  i  6  with  the  x-axis. 
sin  0  —  cos  0  J  2 

(b)  Show  that  the  composition  of  two  such  reflections,  with  angles  9 ,  is  a  rotation. 

What  is  the  angle  of  the  rotation?  Does  the  composition  depend  upon  the  order  of  the  two 
reflections? 


Q 

7.2.11.  (  a)  Find  the  matrix  in  R  that  corresponds  to  a  counterclockwise  rotation  around 
the  x-axis  through  an  angle  60°.  (b)  Write  it  as  a  product  of  elementary  matrices,  and 

interpret  each  of  the  factors. 
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0  7.2.12.  Let  L  C  IR  be  the  line  through  the  origin  in  the  direction  of  a  unit  vector  u.  (a)  Prove 

m 

that  the  matrix  representative  of  reflection  through  L  is  R  =  2uu  —  I.  (b)  Find  the 
corresponding  formula  for  reflection  through  the  line  in  the  direction  of  a  general  nonzero 

vector  v  /  0.  (c)  Determine  the  matrix  representative  for  reflection  through  the  line  in 

rj~\ 

the  direction  (i)  (1,0)  ,  ( ii )  (§,—  |)  ,  (m)  (1,1)T,  ( iv )  (2,  — 3)T. 


7.2.13.  Decompose  the  following  matrices  into  a  product  of  elementary  matrices.  Then 
interpret  each  of  the  factors  as  a  linear  transformation. 


7.2.14.  (a)  Prove  that 


f  cos  0 
^  sin  0 


—  sin  6 
cos  0 


(\ 

1 

0\ 

(1 

2 

°\ 

1 

0 

1 

,  (e)  2 

4 

1 

\0 

1 

1 J 

\2 

1 

1/ 

q  ^  ^ ,  where  a  =  —  tan  \  0  and 


b  =  sind.  (b)  Is  the  factorization  valid  for  all  values  of  01  (c)  Interpret  the  factorization 
geometrically.  Remark.  The  factored  version  is  less  prone  to  numerical  errors  due  to 
round-off,  and  so  can  be  used  when  extremely  accurate  numerical  computations  involving 
rotations  are  required. 


n 

7.2.15.  Determine  the  matrix  representative  for  orthogonal  projection  P:  IR 
through  the  origin  in  the  direction  (a)  (1,0)T,  (b)  (1,1)T,  (c)  (2,—  3  )T 


o 

on  the  line 


rp 

0  7.2.16.  (a)  Prove  that  every  2x2  matrix  of  rank  1  can  be  written  in  the  form  4  =  uv 

where  u,v  G  IR  are  non-zero  column  vectors,  (b)  Which  rank  one  matrices  correspond  to 
orthogonal  projection  onto  a  one-dimensional  subspace  of  IR2? 

7.2.17.  Give  a  geometrical  interpretation  of  the  linear  transformations  on  IR3  defined  by  each  of 
the  six  3x3  permutation  matrices  (1.30). 

o 

7.2.18.  Write  down  the  3x3  matrix  representing  a  clockwise  rotation  in  IR  around  the 
x-axis  by  angle  -0. 


7.2.19.  Explain  why  the  linear  map  defined  by  —  I  defines  a  rotation  in  two-dimensional  space, 
but  a  reflection  in  three-dimensional  space. 

71  Q  'T' 

0  7.2.20.  Let  u  =  ( u1,  u2,  u3  )  G  IR  be  a  unit  vector.  Show  that  =  2uu  —  I  represents 
rotation  around  the  axis  u  through  an  angle  7 r. 

o 

0  7.2.21.  Let  u  G  IR  be  a  unit  vector,  (a)  Explain  why  the  elementary  reflection  matrix 

r-p 

R  =  I  —  2uu  represents  a  reflection  through  the  plane  orthogonal  to  u.  (b)  Prove  that 
R  is  an  orthogonal  matrix.  Is  it  proper  or  improper?  (c)  Write  out  R  when 

u=  (0  (|.o,-|)T,  («)  (TT-t!)?  M  ■  (d)  Givea 

geometrical  explanation  why  Q ^  =  —  R  represents  the  rotation  of  Exercise  7.2.20. 

Q 

0  7.2.22.  Let  a  G  IR  ,  and  let  Q  be  any  3x3  rotation  matrix  such  that  Qa  =  e3.  (a)  Show, 

fT] 

using  the  notation  of  (7.25),  that  Rq  =  Q  ZqQ  represents  rotation  around  a  by  angle  0. 
(b)  Verify  this  formula  in  the  case  a  =  e2  by  comparing  with  (7.26). 

T  7.2.23.  Quaternions :  The  skew  field  HI  of  quaternions  can  be  identified  with  the  vector  space 

IR4  equipped  with  a  noncommutative  multiplication  operation.  The  standard  basis  vectors 

T  4 

el5  e2,  e3,  e4  are  traditionally  denoted  by  the  letters  1,  i ,  j ,  k  ;  the  vector  ( a,  b,  c,  d)  G  IR 
corresponds  to  the  quaternion  q  =  a  b i  +  cj  +  dk.  Quaternion  addition  coincides  with 
vector  addition.  Quaternion  multiplication  is  defined  so  that 

q  =  q  =  q  1,  I  =  J  =  k  =-l,  ij=k=-ji,ik=-j=-ki,jk  =  i=  -kj, 
along  with  the  distributive  laws 

(q  +  r)  s  =  qs  +  r  s,  q  (r  +  s)  =  qr  +  qs,  for  all  g,r,sGi. 
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(a)  Compute  the  following  quaternion  products:  (i)  j  (2  —  3  j  +  k ),  (ii)  (l+i)(l  —  2  i  +  j ), 
(Hi)  (1  +  i  —  j  —3k)2,  ( iv )  (2  +  2 i  +  3  j  —  k )(2  —  2 i  —  3 j  +  k ).  ( b )  The  conjugate  of 
the  quaternion  q  =  a  b i  +  cj  -\-  d k  is  defined  to  be  q  =  a  —  bi  —  cj  —  d k .  Prove  that 


O  A 

qq  =  ||  g  ||  =qq,  where  ||  •  ||  is  the  usual  Euclidean  norm  on  R  .  (c)  Prove  that  quaternion 
multiplication  is  associative,  (d)  Let  g  =  a  +  6i+cj+dk  G  HI.  Show  that  L  [r]  =  qr 
and  Rq[r]  =  rq  define  linear  transformations  on  the  vector  space  HI  ~  R4.  Write  down 
their  4x4  matrix  representatives,  and  observe  that  they  are  not  the  same,  since  quaternion 
multiplication  is  not  commutative,  (e)  Show  that  Lq  and  R  are  orthogonal  matrices  if 


i  O  O  Q  O  Q 

\\q\\  =  a  +  b  +  c  +  d  =  1.  (f)  We  can  identify  a  quaternion  q  =  bi  +  cj  +  dk 

rp  o 

with  zero  real  part,  a  =  0,  with  a  vector  q  =  ( 6,  c,  d)  G  R  .  Show  that,  in  this  case,  the 
quaternion  product  gr  =  qxr  -  qr  can  be  identified  with  the  difference  between  the  cross 
and  dot  product  of  the  two  vectors.  Which  vector  identities  result  from  the  associativity 
of  quaternion  multiplication?  Remark.  The  quaternions  were  discovered  by  the  Irish 
mathematician  William  Rowan  Hamilton  in  1843.  Much  of  our  modern  vector  calculus 
notation  is  of  quaternionic  origin,  [IT]. 


Change  of  Basis 

Sometimes  a  linear  transformation  represents  an  elementary  geometrical  transformation, 
but  this  is  not  evident  because  the  matrix  happens  to  be  written  in  the  “wrong”  coordinates. 
The  characterization  of  linear  functions  from  Mn  to  as  multiplication  bymxn  matrices 
in  Theorem  7.5  relies  on  using  the  standard  bases  for  both  the  domain  and  codomain.  In 
many  cases,  these  bases  are  not  particularly  well  adapted  to  the  linear  transformation  in 
question,  and  one  can  often  gain  additional  insight  by  adopting  more  suitable  bases.  To 
this  end,  we  first  need  to  understand  how  to  rewrite  a  linear  transformation  in  terms  of  a 
new  basis. 

The  following  result  says  that,  in  any  basis,  a  linear  function  on  finite-dimensional 
vector  spaces  can  always  be  realized  by  matrix  multiplication  of  the  coordinates.  But  bear 
in  mind  that  the  particular  matrix  representative  will  depend  upon  the  choice  of  bases. 


Theorem  7.18.  Let  L:  V  W  be  a  linear  function.  Suppose  V  has  basis  v1? . . . ,  vn  and 
W  has  basis  w1? . . . ,  wm.  We  can  write 


v  =  xi  vi  + 


+  XnVn 


eV, 


w  =  y1w1  +  ■■■  +pmwm 


e  W, 


where  x  =  (  0C  ^  ^  0C  2  ^  ^  00  ^  |  Sif  0  the  coordinates  of  v  relative  to  the  basis  of  V  and  y  = 

( 2/1?  2/2, . . . ,  yrn  )  are  those  of  w  relative  to  the  basis  of  W.  Then,  in  these  coordinates, 
the  linear  function  w  =  L[v]  is  given  by  multiplication  by  an  m  x  n  matrix  T>,  so  y  =  B  x. 


Proof :  We  mimic  the  proof  of  Theorem  7.5,  replacing  the  standard  basis  vectors  by  more 
general  basis  vectors.  In  other  words,  we  will  apply  L  to  the  basis  vectors  of  V  and  express 
the  result  as  a  linear  combination  of  the  basis  vectors  in  W.  Specifically,  we  write 


m 


L 


=  E  b„ 


w- 


i—  1 


The  coefficients  b-  form  the  entries  of  the  desired  coefficient  matrix.  Indeed,  by  linearity, 


m 


n 


L[v]  =  L[x1\l  +  ---  +  xn\n]  =  x1L[\1\  +  ■  ■  ■  +  xnL[\n]  =  EM  52  biJ 

i= 1  \ j=l 


xj  |  w4, 
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n 


and  so  yi  —  btJ  x- ,  as  claimed.  Q.E.D. 

3  =  1 

Suppose  that  the  linear  transformation  L :  Mn  — »•  is  represented  by  a  certain  m  x  n 
matrix  A  relative  to  the  standard  bases  e1? . . . ,  en  and  e1? . . . ,  em  of  the  domain  and 
codomain.  If  we  introduce  alternative  bases  for  Mn  and  Mm,  then  the  same  linear  trans¬ 
formation  may  have  a  completely  different  matrix  representation.  Therefore,  different 
matrices  may  represent  the  same  underlying  linear  transformation,  but  relative  to  different 
bases  of  its  domain  and  codomain. 

Example  7.19.  Consider  the  linear  transformation 


L[x]  =  L 


x 

x< 


X 


x< 


2^1  +  Ax. 


(7.27) 


which  we  write  in  the  standard,  Cartesian  coordinates  on  M2.  The  corresponding  coefficient 
matrix 

a={\  i)  <7-28) 

is  the  matrix  representation  of  L,  relative  to  the  standard  basis  e1,e2  of  M2,  and  can  be 
read  directly  off  the  explicit  formula  (7.27): 


L[ex]  =  L 


1 

0 


1 

2 


=  e1+2e2,  L[e2]=L 


0 

1 


1 

4 


=  —  e,  +  4e, 


Let  us  see  what  happens  if  we  replace  the  standard  basis  by  the  alternative  basis 


vi  = 


1 

-1 


v2  = 


1 

-2 


What  is  the  corresponding  matrix  formulation  of  the  same  linear  transformation?  Accord¬ 
ing  to  the  recipe  of  Theorem  7.18,  we  must  compute 


L[y 


l  J 


-2  '  =2vi’ 


L[y 


2  J 


3 

—6 


=  3  v. 


The  linear  transformation  acts  by  stretching  in  the  direction  v:  by  a  factor  of  2  and 
simultaneously  stretching  in  the  direction  v2  by  a  factor  of  3.  Therefore,  the  matrix  form 
of  L  with  respect  to  this  new  basis  is  the  diagonal  matrix 


D  = 


2  0 
0  3 


(7.29) 


In  general, 

L[awl  +  b  v2]  =  2av1  +  36  v2, 

and  the  effect  is  to  multiply  the  new  basis  coordinates  a  =  ( a,  6 )  by  the  diagonal  matrix 
D.  Both  (7.28)  and  (7.29)  represent  the  same  linear  transformation  —  the  former  in 
the  standard  basis  and  the  latter  in  the  new  basis.  The  hidden  geometry  of  this  linear 
transformation  is  thereby  exposed  through  an  inspired  choice  of  basis.  The  secret  behind 
such  well- adapted  bases  will  be  revealed  in  Chapter  8. 

How  does  one  effect  a  change  of  basis  in  general?  According  to  formula  (2.23),  if 
v1? . . . ,  vn  form  a  basis  of  Mn,  then  the  coordinates  y  =  (y1:  y2,  •  •  • ,  yn  )  of  a  vector 


(x1,x2,...,x ;„)T  =  x  =  y1v1+y2v2+  •••  +y„v 


n  n 


7.2  Linear  Transformations 


367 


are  found  by  solving  the  linear  system 


A  y  =  x,  where  A  =  ( v1  v2  . . .  vn  ) 


(7.30) 


is  the  nonsingular  n  x  n  matrix  whose  columns  are  the  basis  vectors. 

Consider  first  a  linear  transformation  L :  Mn  — >•  Mn  from  Mn  to  itself.  When  written 
in  terms  of  the  standard  basis,  L[x]  =  Ax  has  a  certain  n  x  n  coefficient  matrix  A.  To 
change  to  the  new  basis  v1? . . . ,  vn,  we  use  (7.30)  to  rewrite  the  standard  x  coordinates  in 
terms  of  the  new  y  coordinates.  We  also  need  to  find  the  coordinates  g  of  an  image  vector 
f  =  Ax  with  respect  to  the  new  basis.  By  the  same  reasoning  that  led  to  (7.30),  its  new 
coordinates  are  found  by  solving  the  linear  system  f  =  A  g.  Therefore,  the  new  codomain 
coordinates  are  expressed  in  terms  of  the  new  domain  coordinates  via 


g  —  S  1f  —  A  1 A  x  =  A  1 A  Ay  =  By. 


We  conclude  that,  in  the  new  basis  v1? . . . ,  vn,  the  matrix  form  of  our  linear  transformation 
is 

B  —  A-1  A  A,  where  A  =  ( vy  v2  . . .  vn ).  (7-31) 

Two  matrices  A  and  B  that  are  related  by  such  an  equation  for  some  nonsingular  matrix  A 
are  called  similar.  Similar  matrices  represent  the  same  linear  transformation,  but  relative 
to  different  bases  of  the  underlying  vector  space  the  matrix  A  serving  to  encode  the 
change  of  basis. 


Example  7.19  (continued).  Returning  to  the  preceding  example,  we  assemble  the  new 


basis  vectors  to  form  the  change  of  basis  matrix  A 


and  verify  that 


S'1  AS 


reconfirming  our  earlier  computation. 


More  generally,  a  linear  transformation  L:  Mn  -T  is  represented  by  an  m  x  n  matrix 
A  with  respect  to  the  standard  bases  on  both  the  domain  and  codomain.  What  happens  if 
we  introduce  a  new  basis  v1? . . . ,  vn  on  the  domain  space  Mn  and  a  new  basis  w1? . . . ,  w777 
on  the  codomain  Mm?  Arguing  as  above,  we  conclude  that  the  matrix  representative  of  L 
with  respect  to  these  new  bases  is  given  by 


B  =  T^AS, 


(7.32) 


where  S  —  ( v1  v2  . . .  vra )  is  the  domain  basis  matrix,  while  T  —  ( Wj  w2  . . .  wm )  is  the 
image  basis  matrix. 


In  particular,  suppose  that  L  has  rank  r  =  dimimgA  =  dimcoimgA.  Let  us  choose 
a  basis  v1? . . . ,  vn  of  Mn  such  that  v1? . . . ,  vr  form  a  basis  of  coimg  A,  while  vr+1, . . . ,  vn 
form  a  basis  for  kerA  =  (coimg  A) A  According  to  Theorem  4.49,  the  image  vectors 


wq  =  L[v1  ], . . . ,  wr  =  L[vr]  form  a  basis  for  img  A,  while  L[vr+1 


=  •••  =  L 


n 


=  0.  We 


further  choose  a  basis  w 


r+1  5 


w 


15 


. ,  wm  for  coker  A  =  (img  A)  ,  and  note  that  the  combination 
wm  is  a  basis  for  Mm.  The  matrix  form  of  L  relative  to  these  two  adapted  bases  is 
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/I  0  0  ...  0  0  ...  0 


B  =  T~1AS=  (  qJ  = 


0 

1 

0 

...  0 

0 

...  0 

.  o 

.  o 

1 

...  0 

.  o 

...  0 

o  . . 

o  . . 

o  . . 

...  1 

o  . . 

...  0 

.  o 

.  o 

.  o 

...  0 

.  o 

...  0 

o  . . 

o  . . 

o  . . 

...  0 

o  . . 

...  0/ 

(7.33) 


In  this  matrix,  the  hrst  r  columns  have  a  single  1  in  the  diagonal  slot,  indicating  that  the 
first  r  basis  vectors  of  the  domain  space  are  mapped  to  the  hrst  r  basis  vectors  of  the 
codomain,  while  the  last  n  —  r  columns  are  all  zero,  indicating  that  the  last  n  —  r  basis 
vectors  in  the  domain  are  all  mapped  to  0.  Thus,  by  a  suitable  choice  of  bases  on  both 
the  domain  and  codomain,  every  linear  transformation  has  an  extremely  simple  canonical 
form  (7.33)  that  depends  only  on  its  rank. 

Example  7.20.  According  to  the  example  following  Theorem  2.49,  the  matrix 

/  2  -1  1  2  \ 

A  =  -8  4-6-4 

V  4  -2  3  2/ 

has  rank  2.  Based  on  those  calculations,  we  choose  the  domain  space  basis 


/  2\ 

/  °\ 

(\\ 

r2\ 

-1 

0 

1 

0 

V1  = 

1 

.  V2  = 

-2 

>  V3  = 

± 

o 

>  V4  = 

2 

\  2/ 

4/ 

0/ 

1/ 

noting  that  v1?  v2  are  a  basis  for  coimgA,  while  v3,  v4  are  a  basis  for  ker  A.  For  our  basis 
of  the  codomain,  we  hrst  compute  wx  =  4v1  and  w2  =  dv2,  which  form  a  basis  for  img  A. 
We  supplement  these  by  the  single  basis  vector  w3  for  coker  A,  where 


w 


l 


By  construction,  B[v1]  =  wx,  F>[v2 


=  w 


25 


B 


=  B 


=  0,  and  thus  the  canonical 


matrix  form  of  this  particular  linear  function  is  given  in  terms  of  these  two  bases  as 


/I  0  0  0\ 

B  —  T~lA  S  —  0  1  0  0  , 

\0  0  0  0/ 


where  the  bases  are  assembled  to  form  the  matrices 

/  2  0  \  -2\ 

o  -1  0  1  0 

1-20  2  ’ 

V  2  4  0  1/ 
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Exercises 


7.2.24.  Find  the  matrix  form  of  the  linear  transformation  L(x,y) 

o 

to  the  following  bases  of  R  : 


x  —  4  y  \ 
-2x  +  3  y  J 


with  respect 


/  —3  2 


7.2.25.  Find  the  matrix  form  of  L[x]  = 

-3 

bases  of  R3 

• 

\ 

v-1 

/  2  \ 

(  0\ 

(  °^i 

(l\ 

(a) 

0 

? 

-1 

1 

0 

.  (0 

0 

V  o  J 

w 

2  \ 

3 

oj 


x  with  respect  to  the  following 


(1\ 

(A 

(  °\ 

(  1\ 

1 

.  (c) 

1  ’ 

1  ’ 

-2 

VI ) 

A) 

V  iy 

7.2.26.  Find  bases  of  the  domain  and  codomain  that  place  the  following  matrices  in  the 


canonical  form  (7.33). 


Use  (7.32)  to  check  your  answer. 


(  2 

0 

v-i 


2 

-1 

1 


3 

6 

-3 

0 


1\ 

2 

3 

4/ 


7.2.27.  (a)  Show  that  every  invertible  linear  function  L:Rn  — Rn  can  be  represented  by  the 
identity  matrix  by  choosing  appropriate  (and  not  necessarily  the  same)  bases  on  the  domain 

and  codomain,  (b)  Which  linear  transformations  are  represented  by  the  identity  matrix 
when  the  domain  and  codomain  are  required  to  have  the  same  basis?  (c)  Find  bases  of  R2 
so  that  the  following  linear  transformations  are  represented  by  the  identity  matrix:  (z)  the 

scaling  map  S[x]  =  2x;  (zz)  counterclockwise  rotation  by  45°;  (in)  the  shear 


0  7.2.28.  Suppose  a  linear  transformation  L:Rn  — y  Rn  is  represented  by  a  symmetric  matrix 
with  respect  to  the  standard  basis  e1? . . . ,  en.  (a)  Prove  that  its  matrix  representative 
with  respect  to  any  orthonormal  basis  u1} . . . ,  u  is  symmetric,  (b)  Is  it  symmetric  when 
expressed  in  terms  of  a  non-orthonormal  basis? 

0  7.2.29.  In  this  exercise,  we  show  that  every  inner  product  ( • ,  • )  on  Rn  can  be  reduced  to 
the  dot  product  when  expressed  in  a  suitably  adapted  basis,  (a)  Specifically,  prove 

n 

that  there  exists  a  basis  v1 , . . . ,  vn  of  Rn  such  that  (x,y)  =  ^  cidi  =  c  •  d,  where 

i  =  1 

c  =  ( c1?  c2, . . . ,  cn  )T  are  the  coordinates  of  x  and  d  =  ( d1?  d2, . . . ,  dn  )T  those  of  y  with 
respect  to  the  basis.  Is  the  basis  uniquely  determined?  (b)  Find  bases  that  reduce  the 

o 

following  inner  products  to  the  dot  product  on  R  : 

(z)  ( v  ,  w  )  =  2v1w1  +  3a2ie2,  (zz)  ( v  ,  w )  =  v1w1  —  v1w2  —  v2w1  +  3a2ie2. 

T  7.2.30.  Dual  functions:  Let  L  :V  — )►  W  be  a  linear  function  between  vector  spaces.  The  dual 

linear  function,  denoted  by  L*:  IF*  F*  (note  the  change  in  direction)  is  defined  so  that 
L*(ra)  =  m°L  for  all  linear  functions  m  £  W*.  (a)  Prove  that  L*  is  a  linear  function. 

(b)  If  M:  IF  — >  Z  is  linear,  prove  that  (M  °  L)*  =  L*  o  M*.  (c)  Suppose  dimF  =  n  and 
dim  W  =  m.  Prove  that  if  L  is  represented  by  the  m  x  n  matrix  A  with  respect  to  bases  of 
V,  W,  then  L  is  represented  by  the  n  x  m  transposed  matrix  A  with  respect  to  the  dual 
bases,  as  defined  in  Exercise  7.1.32. 
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Figure  7.10.  Translation. 

0  7.2.31.  Suppose  A  is  an  mxn  matrix,  (a)  Let  vl5 . . . ,  vn  be  a  basis  of  IRn,  and  Av-  =  G  Mm, 
for  i  =  1, . . . ,  n.  Prove  that  the  vectors  vl5 . . . ,  vn,  wl5 . . . ,  wn,  serve  to  uniquely  specify  A. 

( b )  Write  down  a  formula  for  A. 


7.3  Affine  Transformations  and  Isometries 


Not  every  transformation  of  importance  in  geometrical  applications  arises  as  a  linear  func¬ 
tion.  A  simple  example  is  a  translation ,  whereby  all  the  points  in  Mn  are  moved  in  the 
same  direction  by  a  common  distance.  The  function  that  accomplishes  this  is 


x  +  b, 


x  G  Mn, 


(7.34) 


where  b  G  Mn  determines  the  direction  and  the  distance  that  the  points  are  translated. 
Except  in  the  trivial  case  b  =  0,  the  translation  T  is  not  a  linear  function  because 


r[x  +  y] 


x  +  y  +  b  ^  T[x]+T[y] 


X  +  y  +  2b. 


Or,  even  more  simply,  we  note  that  T[0]  =  b,  which  must  be  0  if  T  is  to  be  linear. 

Combining  translations  and  linear  functions  leads  us  to  an  important  class  of  geometrical 
transformations. 


Definition  7.21.  A  function  E:Mn  — >>  Mn  of  the  form 


F[x]  =  Ax  +  b, 


(7.35) 


where  A  is  an  n  x  n  matrix  and  b  G  Mn,  is  called  an  affine  transformation. 

In  general,  F[x]  is  an  affine  transformation  if  and  only  if  L[x]  =  F[x]  —  F[0]  is  a  linear 
function.  In  the  particular  case  (7.35),  F[ 0]  =  b,  and  so  L[x]  =  Ax.  The  word  “affine” 
comes  from  the  Latin  “affinus” ,  meaning  “related” ,  because  such  transformations  preserve 
the  relation  of  parallelism  between  lines;  see  Exercise  7.3.2. 

For  example,  every  affine  transformation  from  M  to  R  has  the  form 


f{x)  =  a  x  +  f3. 


(7.36) 


As  mentioned  earlier,  even  though  the  graph  of  f(x)  is  a  straight  line,  /  is  not  a  linear 
function  —  unless  /?  =  0,  and  the  line  goes  through  the  origin.  Thus,  to  be  mathematically 
accurate,  we  should  refer  to  (7.36)  as  a  one- dimensional  affine  transformation. 
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Example  7.22.  The  affine  transformation 


F(x,y)  = 


0 

1 


1 

0 


X 

y 


+ 


l 

-2 


y  + 1 

x  —  2 


has  the  effect  of  first  rotating  the  plane  M2  by  90°  about  the  origin,  and  then  translating 

r  i 1 

by  the  vector  ( 1,  —  2 )  .  The  reader  may  enjoy  proving  that  this  combination  has  the  same 
effect  as  just  rotating  the  plane  through  an  angle  of  90°  centered  at  the  point  (  f ,  —  \  )• 
For  details,  see  Exercise  7.3.14. 

Note  that  the  affine  transformation  (7.35)  can  be  obtained  by  composing  a  linear  func¬ 
tion  L[x]  —  Ax  with  a  translation  T[x]  =  x  +  b,  so 

F[x]  —  T  o L[x]  —  T[L[x]]  —  T[ Ax]  —  Ax  -\-  b. 

The  order  of  composition  is  important,  since  G  =  LoT  defines  the  slightly  different  affine 
transformation 


G?[x]  =  L  oT[x]  —  L[T[x]  ]  =  L[x  +  b]  =  A(x  +  b)  =  4x  +  c, 


where 


c  =  Ah. 


More  generally,  the  composition  of  any  two  affine  transformations  is  again  an  affine  trans¬ 
formation.  Specifically,  given 


F[x]  =4x  +  a, 


G[y]  =  By  +  b, 


then 


(£?  °F)[x]  =  G[F[x]  ]  =  G[  Ax  +  a]  =  B  (Ax  +  a)  +  b  =  Cx  +  c, 

where  C  —  B  A,  c  =  T>a  +  b. 


(7.37) 


Note  that  the  coefficient  matrix  of  the  composition  is  the  product  of  the  coefficient  matrices, 
but  the  resulting  vector  of  translation  is  not  the  sum  of  the  two  translation  vectors. 


Exercises 


7.3.1.  True  or  false:  An  affine  transformation  takes  (a)  straight  lines  to  straight  lines; 

(b)  triangles  to  triangles;  (c)  squares  to  squares;  (d)  circles  to  circles;  (e)  ellipses  to  ellipses. 

0  7.3.2.  (a)  Let  F:  Mn  — >  IRn  be  an  affine  transformation.  Let  L1?L2  C  Mn  be  two  parallel  lines. 
Prove  that  F[L- L]  and  F[L2]  are  also  parallel  lines. 

(b)  Is  the  converse  valid:  if  F:  IRn  — >  IRn  maps  parallel  lines  to  parallel  lines,  then  F  is 
necessarily  an  affine  transformation? 


7.3.3.  Describe  the  image  of  (i)  the  x-axis,  (ii)  the  unit  disk  x2  +  y2  <  1,  (in)  the  unit  square 


VI 

o 

ry»  / 
^  1  , 

(a) 

r 

(c) 

t3 

(e) 

T, 

(g) 
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y 
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+ 

2 
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2 

-1 
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+ 
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.8  .6 

1  1 
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X 

y 


(b)  t2 

1 

2 

+ 


x 

y 


3 

0 


(d)  T4 


x 

y 


+ 


-3 

2 

2 

-3 


(f)  T 
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(h)  F 
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X 

y 
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-l 
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7.3.4.  Using  the  affine  transformations  in  Exercise  7.3.3,  write  out  the  following  compositions 
and  verify  that  they  satisfy  (7.37): 

(a)  T3oT4j  (b)  T4  °T3,  (c)  T3°T6j  (  d  )  T6oT3,  (e)  T7»r8,  (f)T8<»T7. 

7.3.5.  Describe  the  image  of  the  triangle  with  vertices  (—1,  0),  (1,  0),  (0,  2)  under  the  affine 

x\  ( 4  —  1  \  /  x  \  ,  /  3 


transformation  T 


V 


V 


+ 


4 


7.3.6.  Under  what  conditions  is  the  composition  of  two  affine  transformations 

(a)  a  translation?  (b)  a  linear  function? 

7.3.7.  (a)  Under  what  conditions  does  an  affine  transformation  have  an  inverse?  (b)  Is  the 
inverse  an  affine  transformation?  If  so,  find  a  formula  for  its  matrix  and  vector  constituents, 
(c)  Find  the  inverse,  when  it  exists,  of  each  of  the  the  affine  transformations  in  Exercise 
7.3.3. 

0  7.3.8.  Let  v1? . . . ,  v  be  a  basis  for  Mn.  (a)  Show  that  every  affine  transformation 


F 

F 


is  uniquely  determined  by  the  n  +  1  vectors  w0  =  E[0],  w1  = 


x]  =  A  x  +  b  on  IR 

]  i  •  • •  i 

wn  =  F[w  ].  (b)  Find  the  formula  for  A  and  b  when  v1  =  e1 
standard  basis  vectors,  (c)  Find  the  formula  for  A,  b  for  a  general  basis  v1? . . . ,  vn. 


,  vn  =  en  are  the 


7.3.9.  Show  that  the  space  of  all  affine  transformations  on 
dimension? 


i  n 


is  a  vector  space.  What  is  its 


0  7.3.10.  In  this  exercise,  we  establish  a  useful  matrix  representation  for  affine  transformations. 
We  identify  IRn  with  the  n-dimensional  affine  subspace  (as  in  Exercise  2.2.28) 

Vn  =  {(x>!)T  =  (Ilv,\Il)T}  C  R”+1 

consisting  of  vectors  whose  last  coordinate  is  fixed  at  xn+1  =  1.  (a)  Show  that 
multiplication  of  vectors  £  Cn  b  y  the  (n  +  1)  x  (n  +  1)  affine  matrix 

coincides  with  the  action  (7.35)  of  an  affine  transformation  on  x  E  Mn.  (b)  Prove  that 

the  composition  law  (7.37)  for  affine  transformations  corresponds  to  multiplication  of  their 
affine  matrices,  (c)  Define  the  inverse  of  an  affine  transformation  in  the  evident  manner, 
and  show  that  it  corresponds  to  the  inverse  affine  matrix. 


b 

1 


Isometry 

A  transformation  that  preserves  distance  is  known  as  an  isometry.  (The  mathematical 
term  metric  refers  to  the  underlying  norm  or  distance  on  the  space;  thus,  “isometric” 
translates  as  “distance-preserving”.)  In  Euclidean  geometry,  the  isometries  coincide  with 
the  rigid  motions  —  translations,  rotations,  reflections,  and  the  affine  maps  they  generate 
through  composition. 

Definition  7.23.  Let  V  be  a  normed  vector  space.  A  function  F:  V  -A  V  is  called  an 
isometry  if  it  preserves  distance,  meaning 

d(F[v],  F[w])  =  d(v,  w)  for  all  v,weU.  (7.38) 


Since  the  distance  between  points  is  just  the  norm  of  the  vector  connecting  them, 
d(v,  w)  =  ||  v  —  w  || ,  cf.  (3.33),  the  isometry  condition  (7.38)  can  be  restated  as 

||  F[v]  —  F[w]  =  ||  v  —  w  ||  for  all  v,wEU.  (7.39) 

Clearly,  any  translation 


T 


=  v  +  a, 


where 


a  E  V, 
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defines  an  isometry,  since  T[v 
an  isometry  if  and  only  if 


—  v  — w.  A  linear  transformation  L:  V  V  defines 


L[ 


for  all 


v  <E  V, 


(7.40) 


because,  by  linearity, 


L 


L[w]  ||  —  ||  L[v  —  w 


w 


=  L[v]  -fa  is  an  isometry 


A  similar  computation  proves  that  an  affine  transformation  F 
if  and  only  if  its  linear  part  L[v]  is. 

As  noted  above,  the  simplest  class  of  isometries  comprises  the  translations 


T[x]  =  x  +  b 


(7.41) 


in  the  direction  b.  For  the  standard  Euclidean  norm  on  V  =  Mn,  the  linear  isometries 
consist  of  rotations  and  reflections.  As  we  shall  prove,  both  are  characterized  by  orthogonal 
matrices: 

L[x]  =Qx,  where  QTQ  —  I.  (7.42) 

The  proper  isometries  correspond  to  the  rotations,  with  detQ  =  +1,  and  can  be  realized 
as  physical  motions;  improper  isometries ,  with  det  Q  =  —  1,  are  then  obtained  by  reflection 
in  a  mirror. 


Proposition  7.24.  A  linear  transformation  L[x 
Mn  if  and  only  if  Q  is  an  orthogonal  matrix. 


=  Qx  defines  a  Euclidean  isometry  of 


Proof :  The  linear  isometry  condition  (7.40)  requires  that 


Q  x  *  =  (Q  x)rQ  x  =  xi  Q1  Q  x  =  xi  x  = 


T/WT. 


.T. 


X 


for  all 


xGl 


n 


According  to  Exercise  4.3.16,  this  holds  if  and  only  if  QTQ  =  I,  which  is  precisely  the 


condition  (4.29)  that  Q  be  an  orthogonal  matrix. 


Q.E.D. 


It  can  be  proved,  [93],  that  the  most  general  Euclidean  isometry  of  Mn  is  an  affine 
transformation,  and  hence  of  the  form  E[x]  =  Qx  +  b,  where  Q  is  an  orthogonal  matrix 
and  b  is  a  vector.  Therefore,  every  Euclidean  isometry  or  rigid  motion  is  a  combination  of 
translations,  rotations,  and  reflections. 

In  the  two-dimensional  case,  the  proper  linear  isometries  i?[x]  =  Qx  with  detQ  =  1 
represent  rotations  around  the  origin.  More  generally,  a  rotation  of  the  plane  around  a 
center  at  c  is  represented  by  the  affine  isometry 


i?[x]  =  Q(x  —  c)  +  c  =  Qx  +  b, 


where 


b  =  (I  -Q) c, 


(7.43) 


and  where  Q  is  a  rotation  matrix.  In  Exercise  7.3.14,  we  ask  you  to  prove  that  every  plane 
isometry  is  either  a  translation  or  a  rotation  around  a  center. 

In  three-dimensional  space,  both  translations  (7.41)  and  rotations  around  a  center  (7.43) 
continue  to  define  proper  isometries.  There  is  one  additional  type,  representing  the  motion 
of  a  point  on  the  head  of  a  screw.  A  screw  motion  is  an  affine  transformation  of  the  form 


Qx  +  a, 


(7.44) 


where  the  3x3  orthogonal  matrix  Q  represents  a  rotation  through  an  angle  9  around  a 
fixed  axis  in  the  direction  of  the  vector  a,  which  is  also  the  direction  of  the  translation 
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Figure  7.11.  A  Screw  Motion. 


term.  The  result  is  indicated  in  Figure  7.11;  the  trajectory  followed  by  a  point  not  on  the 
axis  is  a  circular  helix  centered  on  the  axis.  For  example, 

/  x  \  /  cos  9  —  sin  9  [  x\  /  0  \ 

sAj/IMsin#  cos6>  0j(J/)  +  (°j 

represents  a  vertical  screw  along  the  z- axis  through  an  angle  9  by  a  distance  a.  In  Exercise 
8.2.45  you  are  asked  to  prove  that  every  proper  isometry  of  M3  is  either  a  translation,  a 
rotation,  or  a  screw  motion. 

The  isometries  of  M2  and  M3  are  indispensable  for  understanding  of  how  physical  objects 
move  in  three-dimensional  space.  Basic  computer  graphics  and  animation  require  efficient 
implementation  of  rigid  isometries  in  three-dimensional  space  and  their  compositions 
coupled  with  appropriate  (nonlinear)  perspective  maps  prescribing  the  projection  of  three- 
dimensional  objects  onto  a  two-dimensional  viewing  screen,  [12,  72]. 


Exercises 


Note :  All  exercises  are  based  on  the  Euclidean  norm  unless  otherwise  noted. 

7.3.11.  Which  of  the  indicated  maps  F (x,y)  define  isometries  of  the  Euclidean  plane? 

V  \  (u\  (x~  2\  /  \  (x  —  y+  l\  1  (  x -\- y  —  3\  ,  .1  (  3x  +  4  y 


(a) 


-x 


>  ( b ) 


y 


(c) 


x  +  2 


7.3.12.  Prove  that  the  planar  affine  isometry  F 


(d) 


x 

y 


^2  \x  +  y  -  2 

_  ( -y+ i 


)• (e)  K 


4x  +  3y  +  1 


x 


represents  a  rotation 


T 


through  an  angle  of  90  around  the  center  (  2  ’  —  2 

7.3.13.  True  or  false:  The  map  L[x]  =  —  x  for  x  £  Mn  defines  (a)  an  isometry;  (b)  a  rotation. 

0  7.3.14.  Prove  that  every  proper  affine  plane  isometry  F[x]  =  Qx.  +  b  of  R  ,  where  detQ  =  1, 
is  either  ( i )  a  translation,  or  (ii)  a  rotation  (7.43)  centered  at  some  point  c  £  R  . 

Hint :  Use  Exercise  1.5.7. 

7.3.15.  Compute  both  compositions  FoG  and  G°F  of  the  following  affine  transformations  on 
R2.  Which  pairs  commute?  (a)  F  =  counterclockwise  rotation  around  the  origin  by  45°; 

G  =  translation  in  the  y  direction  by  3  units,  (b)  F  =  counterclockwise  rotation  around 

the  point  ( 1, 1  )T  by  30°;  G  =  counterclockwise  rotation  around  the  point  (  —  2, 1  )T  by  90° 
(c)  F  =  reflection  through  the  line  y  =  x  +  1;  G  =  rotation  around  ( 1, 1 )  by  180°. 
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9 

T  7.3.16.  In  R  ,  show  the  following:  (a)  The  composition  of  two  affine  isometries  is  another 
affine  isometry,  (b)  The  composition  of  two  translations  is  another  translation,  (c)  The 
composition  of  a  translation  and  a  rotation  (not  necessarily  centered  at  the  origin)  in  either 
order  is  a  rotation,  (d)  The  composition  of  two  plane  rotations  is  either  another  rotation  or 
a  translation.  What  is  the  condition  for  the  latter  possibility?  (e)  Every  plane  translation 
can  be  written  as  the  composition  of  two  rotations. 

o  9 

0  7.3.17.  Let  £  be  a  line  in  R  .  A  glide  reflection  is  an  affine  map  on  R  composed 
of  a  translation  in  the  direction  of  £  by  a  distance  d  followed  by  a  reflection 
through  £.  Find  the  formula  for  a  glide  reflection  along 
(a)  the  x-axis  by  a  distance  2;  (b)  the  line  y  =  x  by  a  distance  3  in 
the  direction  of  increasing  x;  (c)  the  line  x  +  y  =  1  by  a  distance  2 
in  the  direction  of  increasing  x. 

0  7.3.18.  Let  i  be  the  line  in  the  direction  of  the  unit  vector  u  through  the  point  a.  (a)  Write 
down  the  formula  for  the  affine  map  defining  the  reflection  through  the  line  i.  Hint :  Use 
Exercise  7.2.12.  (b)  Write  down  the  formula  for  the  glide  reflection,  as  defined  in  Exercise 
7.3.17,  along  £  by  a  distance  d  in  the  direction  of  u.  (c)  Prove  that  every  improper  affine 
plane  isometry  is  either  a  reflection  or  a  glide  reflection.  Hint :  Use  Exercise  7.2.10. 

T  7.3.19.  A  set  of  n  +  1  points  a0, . . . ,  an  £  Rn  is  said  to  be  in  general  position  if  the  differences 
a?  —  span  Rn.  (a)  Show  that  the  points  are  in  general  position  if  and  only  if  they  do 


n 


a..  and 

/  L 

)Tl 


0  7.3 


0  7.3 


not  all  lie  in  a  proper  affine  subspace  A  C  Rn,  cf.  Exercise  2.2.28.  (b)  Let  a0, 
b0, . . . ,  bn  be  two  sets  in  general  position.  Show  that  there  is  an  isometry  F:  R'“  — > 
such  that  F[ aj  =  b^  for  all  i  =  0, ... ,  n,  if  and  only  if  their  interpoint  distances  agree: 
a-  —  a  -  II  =  II  b  •  —  b  •  for  all  0  <  i  <  j  <  n.  Hint:  Use  Exercise  4.3.19. 

.20.  Suppose  that  V  is  an  inner  product  space  and  L :  V  — >  V  is  an  isometry,  so 
||  L[v]  ||  =  ||  v  ||  for  all  v  £  V.  Prove  that  L  also  preserves  the  inner  product: 

(  L[  v]  ,  L[w] )  =  ( v  ,  w  ).  Hint :  Look  at  ||L[v-fw]||2. 

.21.  Let  V  be  a  normed  vector  space.  Prove  that  a  linear  map  L  :V  — >  V  defines  an 


7.3 

7.3 
T  7.3 


isometry  of  V  for  the  given  norm  if  and  only  if  it  maps  the  unit  sphere  =  {  ||  u  ||  =  1 } 
to  itself:  L [ S1  ]  =  { L [ u ]  |u  £  S1  }  =  S1. 

.22.  (a)  List  all  linear  and  affine  isometries  of  R  with  respect  to  the  oo  norm.  Hint:  Use 

o 

Exercise  7.3.21.  (b)  Can  you  generalize  your  results  to  R  ? 

.23.  Answer  Exercise  7.3.22  for  the  1  norm. 

/  cosVi  rv  si  nil  cm  \ 

.24.  A  matrix  of  the  form  H  =  .  ,  1  for  a  £  R  defines  a  hyperbolic  rotation 

\  smh  a  cosh  a  J 

of  R2.  (a)  Prove  that  all  hyperbolic  rotations  preserve  the  indefinite  quadratic  form 


. .  . .  r  j  i 

q(x)  =  x  —  y  in  the  sense  that  q(Hx)  =  q(x)  for  all  x  =  (x,y)  £ 


Observe 


that  ordinary  rotations  preserve  circles  x2  +  y2  =  a,  while  hyperbolic  rotations  preserve 
hyperbolas  x  —  y  =  a.  (b)  Are  there  any  other  affine  transformations  of  R  that 
preserve  the  quadratic  form  g(x)?  Remark.  The  four-dimensional  version  of  this 
construction,  i.e.,  affine  maps  preserving  the  indefinite  Minkowski  form  t2  —  x2  —  y 2 
forms  the  geometrical  foundation  for  Einstein’s  theory  of  special  relativity,  [55], 

o 

O  7.3.25.  Let  £  C  R  be  a  line,  and  p  0  £  a  point.  A  perspective  map 
takes  a  point  x  £  R  to  the  point  q  £  £  that  is  the  intersection  of 
£  with  the  line  going  through  p  and  x.  If  the  line  is  parallel  to  £,  then 
the  map  is  not  defined.  Find  the  formula  for  the  perspective  map  when 

(a)  £  is  the  x-axis  and  p  =  (0, 1 )  ,  (b)  £  is  the  line  y  =  x  and 

p  =  (1,0)T.  Is  either  map  affine?  An  isometry?  Remark.  Mapping 
three-dimensional  objects  onto  a  two-dimensional  screen  (or  your  retina) 
is  based  on  perspective  maps,  which  are  thus  of  fundamental  importance 
in  art,  optics,  computer  vision,  computer  graphics  and  animation,  and  computer  games. 


z 
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7.4  Linear  Systems 

The  abstract  notion  of  a  linear  system  serves  to  unify,  in  a  common  conceptual  framework, 
linear  systems  of  algebraic  equations,  linear  differential  equations,  both  ordinary  and  par¬ 
tial,  linear  boundary  value  problems,  linear  integral  equations,  linear  control  systems,  and 
a  huge  variety  of  other  linear  systems  that  appear  in  all  aspects  of  mathematics  and  its 
applications.  The  idea  is  simply  to  replace  matrix  multiplication  by  a  general  linear  func¬ 
tion.  Many  of  the  structural  results  we  learned  in  the  matrix  context  have,  when  suitably 
formulated,  direct  counterparts  in  these  more  general  frameworks.  The  result  is  a  unified 
understanding  of  the  basic  properties  and  nature  of  solutions  to  all  such  linear  systems. 


Definition  7.25.  A  linear  system  is  an  equation  of  the  form 


(7.45) 


in  which  L:U  V  is  a  linear  function  between  vector  spaces,  the  right-hand  side  is  an 
element  of  the  codomain,  f  E  V,  while  the  desired  solution  belongs  to  the  domain,  u  E  U. 
The  system  is  homogeneous  if  f  =  0;  otherwise,  it  is  called  inhomogeneous. 


If  U  —  Mn  and  V  —  Mm,  then,  according  to  Theorem  7.5,  every  linear 

Mm  is  given  by  matrix  multiplication:  L[u]  =  Aw.  Therefore,  in  this 
particular  case,  every  linear  system  is  a  matrix  system,  namely  4u  =  f. 


Example  7.26. 

function  L :  Mn  — >• 


Example  7.27.  A  linear  ordinary  differential  equation  takes  the  form  L[u]  =  /,  where 

L  is  an  nth  order  linear  differential  operator  of  the  form  (7.15),  and  the  right-hand  side  is, 
say,  a  continuous  function.  Written  out,  the  differential  equation  takes  the  familiar  form 


(]n  it 

L[u]  =  a  (x)  —  +  an_1(x) 


d 


n  —  1 


U 


+  •••  a1(x)  — — b  a0(x)u  =  f(x). 


dxn  '  n  iV  7  dxn  1  iV  y  dx  '  uv  J  J  v  7  6) 

You  should  have  already  gained  some  familiarity  with  solving  the  constant  coefficient  case 
as  covered,  for  instance,  in  [7,  22]. 


Example  7.28.  Let  K(x,y)  be  a  function  of  two  variables  that  is  continuous  for  all 
a  <  x,y  <  b.  Then  the  integral 

IK[u]  =  /  K(x,y)u(y)dy 

J  a 


defines  a  linear  operator  v  C°[a,  b]  C°[a,  6],  known  as  an  integral  transform.  Impor¬ 
tant  examples  include  the  Fourier  and  Laplace  transforms,  [61,  79].  Finding  the  inverse 
transform  requires  solving  a  linear  integral  equation  IK\  u]  =  /,  which  has  the  explicit  form 

■ b 

K(x,y)u(y)dy  =  f(x). 


a 


Example  7.29.  We  can  combine  linear  maps  to  form  more  complicated,  “mixed”  types 
of  linear  systems.  For  example,  consider  a  typical  initial  value  problem 

u" -\- ur  —  2u  =  x,  u( 0)  =  1,  ?/(0)  =  —  1,  (7.47) 

for  an  unknown  scalar  function  u(x).  The  differential  equation  can  be  written  as  a  linear 
system 


L 


u 


=  X. 


where 


L[u]  =  ( D 2  +  D  —  2)[u]  =  u"  +  u!  —  2u 
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is  a  linear,  constant  coefficient  differential  operator.  Further, 


L[u\  \  / u"(x)  +  ?/(x)  —  2?i(x) 

M[tx]  =  (  tx(0)  )  =  [  i^(O) 

u'(0)  /  V  ^(°) 

defines  a  linear  map  whose  domain  is  the  space  U  —  C2  of  twice  continuously  differen¬ 
tiable  functions  u(x),  and  whose  image  is  the  vector  space  V  consisting  of  all  triples^ 

f(x)\ 

v  =  i  a  ,  where  /  G  C°  is  a  continuous  function  and  a,b  €  R  are  real  constants.  You 

b  ) 

should  convince  yourself  that  V  is  indeed  a  vector  space  under  the  evident  addition  and 
scalar  multiplication  operations.  In  this  way,  we  can  write  the  initial  value  problem  (7.47) 
in  linear  systems  form  as  M[u]  =  f,  where  f  =  (x,  1,  —  1  )T. 

A  similar  construction  applies  to  linear  boundary  value  problems.  For  example,  the 
boundary  value  problem 


u"  +  u  —  ex , 
is  in  the  form  of  a  linear  system 


u(0)  =  1, 


u(l)  =  2, 


B 


u 


=  f. 


where 


B 


u 

= 

u"{x)  +  u{x) 

0) 
u(l) 

Note  that  B:  C2  — >  V  defines  a  linear  map  having  the  same  domain  and  codomain  as  the 
initial  value  problem  map  M . 


Exercises 


7.4.1.  True  or  false:  If  F[x]  is  an  affine  transformation  on  IRn,  then  the  equation  F[x]  =  c 
defines  a  linear  system. 


7.4.2.  Place  each  of  the  following  linear  systems  in  the  form  (7.45).  Carefully  describe  the 
linear  function,  its  domain,  its  codomain,  and  the  right-hand  side  of  the  system.  Which 
systems  are  homogeneous?  (a)  3x  +  5  =  0,  (b)  x  =  y  +  z,  (c)  a  =  2b  —  3,  b  =  c  —  1, 

(d)  3 (p  —  2)  =  2 (q  —  3),  p  +  q  =  0,  (e)  u  +3 xu  =  0,  (f)  u  +  3x  =  0,  (g)  u  =  u,  u( 0)  =  1, 
(h)  u"  —  u  =  ex ,  tt(0)  =  3a(l),  (i)  u'  +  x2  u  =  3x,  tt(0)  =  1,  a;(0)  =  0,  (j)  u  =  a, 

v  =  2a,  (k)  u"  —  v"  =  2 u  —  v,  u( 0)  =  v(0),  u(  1)  =  v(l),  (J)  u(x)  =  1  —  3  u(y )  dg, 


■oo 


(m)  /  a(t)  e 


-s  t 


'0 


dt  =  1  + 
du 


'  ' /  ,,i:  rl  ,l!  "!  3  j  (°)  £  u{y)dy  =  j  yv{y)  dy, 


(P)  =  +  2?  =  l,  (<,) 


r0 


dt 


dx  dg  dg 


dx 
dx  ’ 


(r) 


0 

d2a  d2u 


dx 2  dy‘‘ 


ro 

2  ,  2 
x  +  y 


1. 


7.4.3.  The  Fredholm  Alternative  of  Theorem  4.46  first  appeared  in  the  study  of  what  are  now 

rb 

known  as  Fredholm  integral  equations :  u(x)  +  /  K(x,y)u(y)dy  =  /(x),  in  which  K(x,y ) 

J  a 


and  /(x)  are  prescribed  continuous  functions.  Explain  how  the  integral  equation  is  a  linear 
system;  i.e.,  describe  the  linear  map  L,  its  domain  and  codomain,  and  prove  linearity. 


^  This  is  a  particular  case  of  the  general  Cartesian  product  construction  between  vector  spaces; 
here  V  =  C°  x  IR2.  See  Exercise  2.1.13  for  details. 
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rt 

7.4.4.  Answer  Exercise  7.4.3  for  the  Volterra  integral  equation  u(t )  +  /  K(t ,  s)  ?i(s)  ds  =  /(£), 

where  a  <  t  <  b.  a 

7.4.5.  (a)  Prove  that  the  solution  to  the  linear  integral  equation  u(t)  =  a  +  j*  k(s)  u(s)  ds 

du 

solves  the  linear  initial  value  problem  —  =  k(t)u(t ),  zz(0)  =  a. 

(b)  Use  part  (a)  to  solve  the  following  integral  equations 

(z)  u(t)  =  2  —  f  u(s)  ds ,  (zz)  zz(£)  =  1  +  2  f  su(s)ds ,  (zzz)  u(t)=3+  [  esu(s)ds. 

0  J 1  v  0 


The  Superposition  Principle 

Before  attempting  to  tackle  general  inhomogeneous  linear  systems,  we  should  look  first  at 
the  homogeneous  version.  The  most  important  fact  is  that  homogeneous  linear  systems 
admit  a  superposition  principle  that  allows  one  to  construct  new  solutions  from  known 
solutions.  Recall  that  the  word  “superposition”  refers  to  taking  linear  combinations  of 
solutions. 

Consider  a  general  homogeneous  linear  system 


(7.48) 


where  L:  U  V  is  a  linear  function.  If  we  are  given  two  solutions,  say  zx  and  z2,  meaning 
that 


L[  zj  =  0, 


then  their  sum  zx  +  z2  is  automatically  a  solution,  since,  in  view  of  the  linearity  of  L, 


L[  zi  + 


=  L 


T  L[z2  —  0  -\~  0  —  0. 


Similarly,  given  a  solution  z  and  any  scalar  c,  the  scalar  multiple  cz  is  automatically  a 
solution,  since 


cL[z]  =  cO  =  0. 


Combining  these  two  elementary  observations,  we  can  now  state  the  general  superposition 
principle.  The  proof  is  an  immediate  consequence  of  formula  (7.4). 


Theorem  7.30.  If  z1:...,zk  are  all  solutions  to  the  same  homogeneous  linear  system 
L[z]  =  0,  then  every  linear  combination  c1z1  +  •  •  •  +  ckzk  is  also  a  solution. 


As  with  matrices,  we  call  the  solution  space  to  the  homogeneous  linear  system  (7.48) 
the  kernel  of  the  linear  function  L.  The  superposition  principle  implies  that  the  kernel 
always  forms  a  subspace. 


Proposition  7.31.  If  L:  U  V  is  a  linear  function,  then  its  kernel 

kerL  =  {  z  e  U  \  L[ z]  =  0  }  C  U  (7.49) 

is  a  subspace  of  the  domain  space  U. 


As  we  know,  in  the  case  of  linear  matrix  systems,  the  kernel  can  be  explicitly  determined 
by  applying  the  usual  Gaussian  Elimination  algorithm.  To  solve  more  general  homogeneous 
linear  systems,  e.g.,  linear  differential  equations,  one  must  develop  appropriate  analytical 
solution  techniques. 
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Example  7.32.  Consider  the  second  order  linear  differential  operator 

L  =  D2  -2D-3,  (7.50) 

which  maps  the  function  u{x)  to  the  function 

L[u]  =  ( D 2  —  2D  —  3)[?i]  =  u"  —  2  u!  —  3  u. 

The  associated  homogeneous  system  takes  the  form  of  a  homogeneous,  linear,  constant 
coefficient  second  order  ordinary  differential  equation 


L 


u 


=  u"  —  2u  —  3u  —  0. 


(7.51) 


In  accordance  with  the  standard  solution  method,  we  plug  the  exponential  ansatz^ 


u  =  e 


X  x 


into  the  equation.  The  result  is 


L[e 


A  x 


=  D‘ 


,Xx 


-  2 D[eXx]  -  3eAX  =  Xz  eAX  -  2XeAX  -  3eAX  =  (Xz  -  2A  -  3)eAx. 


,A  x 


2  „Ax 


A  x 


,Xx 


,Ax 


Therefore,  u  =  eXx  is  a  solution  if  and  only  if  A  satisfies  the  characteristic  equation 

0  =  A2-2A-3  =  (A-  3)(A  +  1). 


The  two  roots  are  A:  =  3,  A2  =  — 1,  and  hence 

u1(x)  —  e3x ,  u2(x)  ~  e~x ^  (7.52) 

are  two  linearly  independent  solutions  of  (7.51).  According  to  the  general  superposition 
principle,  every  linear  combination 

u(x)  =  c1  u1(x)  +  c2u2{x)  =  c1  e3x  +  c2  e~x  (7.53) 


of  these  two  basic  solutions  is  also  a  solution,  for  any  choice  of  constants  c1?c2.  In  fact, 
this  two-parameter  family  (7.53)  constitutes  the  most  general  solution  to  the  ordinary 
differential  equation  (7.51);  indeed,  this  is  a  consequence  of  Theorem  7.34  below.  Thus, 
the  kernel  of  the  second  order  differential  operator  (7.50)  is  two-dimensional,  with  basis 
given  by  the  independent  exponential  solutions  (7.52). 


In  general,  the  solution  space  to  an  nth  order  homogeneous  linear  ordinary  differential 
equation 


dnu  dn  1u  du 

an\x)  - - r  T  •••  +  a,  (x)  —  +  a0(xm  =  0 

nV  7  dxn  n  1V  7  dxn~x  1V  7  dx  uv  7 


(7.54) 


is  a  subspace  of  the  vector  space  Cn(a,  b)  of  n  times  continuously  differentiable  functions 
defined  on  an  open  interval^  a  <  x  <  6,  since  it  is  just  the  kernel  of  a  linear  differential 


^  The  German  word  Ansatz  refers  to  the  method  of  finding  a  solution  to  a  complicated  equation 
by  guessing  the  solution’s  form  in  advance.  Typically,  one  is  not  clever  enough  to  guess  the  precise 
solution,  and  so  the  ansatz  will  have  one  or  more  free  parameters  —  in  this  case  the  constant 
exponent  A  —  that,  with  some  luck,  can  be  rigged  up  to  fulfill  the  requirements  imposed  by  the 
equation.  Thus,  a  reasonable  English  translation  of  “ansatz”  is  “inspired  guess” . 

^  We  allow  a  and/or  b  to  be  infinite. 
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operator  L:Cn(a,  b)  — C°(a,  6).  This  implies  that  linear  combinations  of  solutions  are 
also  solutions.  To  determine  the  number  of  solutions,  or,  more  precisely,  the  dimension  of 
the  solution  space,  we  need  to  impose  some  mild  restrictions  on  the  differential  operator. 

Definition  7.33.  A  differential  operator  L  given  by  (7.54)  is  called  nonsingular  on  an  open 
interval  (a,  b)  if  all  its  coefficients  are  continuous  functions,  so  an(x), . . . ,  a0(x)  E  C°(a,  6), 
and  its  leading  coefficient  does  not  vanish:  an{x)  0  for  all  a  <  x  <  b. 

The  basic  existence  and  uniqueness  theorems  governing  nonsingular  homogeneous  linear 
ordinary  differential  equations  can  be  reformulated  as  a  characterization  of  the  dimension 
of  the  solution  space. 


Theorem  7.34.  The  kernel  of  a  nonsingular  nth  order  ordinary  differential  operator  is  an 
n-dimensional  subspace  kerL  C  Cn(a,  b). 


A  proof  of  this  theorem  relies  on  the  fundamental  existence  and  uniqueness  theorems 
for  ordinary  differential  equations,  and  can  be  found  in  [7,  36].  The  fact  that  the  kernel 
has  dimension  n  means  that  it  has  a  basis  consisting  of  n  linearly  independent  solutions 
u1(x): . . . ,  un{x)  E  Cn(a,  b)  with  the  property  that  every  solution  to  the  homogeneous 
differential  equation  (7.54)  is  given  by  a  linear  combination 


u(x)  =  c1  u1(x)  + 


+  cnUn(x), 


where  c1? . . . ,  cn  are  arbitrary  constants.  Therefore,  once  we  find  n  linearly  independent 
solutions  of  an  nth  order  homogeneous  linear  ordinary  differential  equation,  we  can  imme¬ 
diately  write  down  its  most  general  solution. 

The  condition  that  the  leading  coefficient  an(x)  0  is  essential.  Points  where  an(x)  =  0 
are  known  as  singular  points.  Singular  points  show  up  in  many  applications,  and  must  be 
treated  separately  and  with  care,  [7,22,61].  Of  course,  if  the  coefficients  are  constant, 
then  there  is  nothing  to  worry  about  —  either  the  leading  coefficient  is  nonzero,  an  f  0,  or 
the  differential  equation  is,  in  fact,  of  lower  order  than  advertised.  Here  is  the  prototypical 
example  of  an  ordinary  differential  equation  with  a  singular  point. 

Example  7.35.  A  second  order  Euler  differential  equation  takes  the  form 


E 


u 


=  axzu"  +  bxu'  +  cu  =  0, 


(7.55) 


where  a  f  0  and  5,  c  are  constants.  Here  E  =  ax2  D2  +  bx  D  +  c  is  a  second  order  variable 
coefficient  linear  differential  operator.  Instead  of  the  exponential  solution  ansatz  used  in 
the  constant  coefficient  case,  Euler  equations  are  solved  by  using  a  power  ansatz 


(7.56) 


with  unknown  exponent  r.  Substituting  into  the  differential  equation,  we  find 

E[xr  ]  =  ax2  D2[xr]  +  bx  D[xr]  -\-  cxr 

=  ar  (r  —  1)  xr  +  brxr  +  cxr  =  [ar(r—  1)  +  br  +  c]  xr . 

Thus,  xr  is  a  solution  if  and  only  if  r  satisfies  the  characteristic  equation 

ar  (r  —  1)  +  br  +  c  =  ar2  +  (6  —  a)  r  +  c  =  0. 


(7.57) 


If  the  quadratic  characteristic  equation  has  two  distinct  real  roots,  rq  f  r2,  then  we  obtain 
two  linearly  independent  solutions  u1(x)  =  xri  and  u2(x)  =  xr2,  and  so  the  general  (real) 
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solution  to  (7.55)  has  the  form 


u{x)  =  c1\x\ri  +  c 


x 


V2 


(7.58) 


(The  absolute  values  are  usually  needed  to  ensure  that  the  solutions  remain  real  when 
x  <  0.)  The  other  cases  —  repeated  roots  and  complex  roots  —  will  be  discussed  below. 

The  Euler  equation  has  a  singular  point  at  x  =  0,  where  its  leading  coefficient  vanishes. 
Theorem  7.34  assures  us  that  the  differential  equation  has  a  two-dimensional  solution 
space  on  every  interval  not  containing  the  singular  point.  However,  predicting  the  number 
of  solutions  that  remain  continuously  differentiable  at  x  =  0  is  not  so  easy,  since  it  depends 
on  the  values  of  the  exponents  r1  and  r2.  For  instance,  the  case 

xu  —  3xu  -\-3u  —  0  has  general  solution  u  =  c1x-\-c2x  , 

which  forms  a  two-dimensional  subspace  of  C°(R).  However, 

c2 

has  general  solution  u  —  c,x  -\ - , 

x 


2  //  / 
x  u  +  xu  —  u  —  0 


and  only  the  multiples  of  the  first  solution  x  are  continuous  at  x  —  0.  Therefore,  the 
solutions  that  are  continuous  everywhere  form  only  a  one-dimensional  subspace  of  C°(R). 
Finally, 


x2  u"  +  5  x  u!  +  3  u  —  0 


has  general  solution 


u  =  —  + 


ry *  ry*  U 

In  this  case,  there  are  no  nontrivial  solutions  u{x)  ^  0  that  are  continuous  at  x  —  0,  and 
so  the  space  of  solutions  defined  on  all  of  R  is  zero-dimensional. 

The  superposition  principle  is  equally  valid  in  the  study  of  homogeneous  linear  partial 
differential  equations.  Here  is  a  particularly  noteworthy  example. 


Example  7.36.  Consider  the  Laplace  equation 


A  [u 


d2u  d2u 
dx 2  +  dy2 


(7.59) 


for  a  function  u(x,y)  defined  on  a  domain  9  C  M2.  The  Laplace  equation  is  named 
after  the  renowned  eighteenth-century  French  mathematician  Pierre-Simon  Laplace,  and 
is  the  most  important  partial  differential  equation.  Its  applications  range  over  almost  all 
fields  of  mathematics,  physics,  and  engineering,  including  complex  analysis,  differential 
geometry,  fluid  mechanics,  electromagnetism,  elasticity,  thermodynamics,  and  quantum 
mechanics,  [61].  The  Laplace  equation  is  a  homogeneous  linear  partial  differential  equation 
corresponding  to  the  partial  differential  operator  A  =  d2  +  d2  known  as  the  Laplacian. 
Linearity  can  either  be  proved  directly,  or  by  noting  that  A  is  built  up  from  the  basic  linear 
partial  derivative  operators  dx,  dy  by  the  processes  of  composition  and  addition,  as  detailed 
in  Exercise  7.1.46.  Solutions  to  the  Laplace  equation  are  known  as  harmonic  functions. 

Unlike  homogeneous  linear  ordinary  differential  equations,  there  is  an  infinite  number 
of  linearly  independent  solutions  to  the  Laplace  equation.  Examples  include  the  trigono¬ 
metric/exponential  solutions 


eUJX  cos  uj  y 


eUJX  sin uy, 


eujy  cosc ax. 


eujy  sin  uy, 


where  uj  is  any  real  constant.  There  are  also  infinitely  many  independent  harmonic  poly¬ 
nomial  solutions,  the  first  few  of  which  are 


1, 


x, 


y , 


or 


y' 


xy, 


x 1 


e 

3  xy‘ 


•> 
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The  reader  might  enjoy  finding  some  more  polynomial  solutions  and  trying  to  spot  the 
pattern.  (The  answer  will  appear  shortly.)  As  usual,  we  can  build  up  more  compli¬ 
cated  solutions  by  taking  general  linear  combinations  of  these  particular  ones;  for  instance, 

u(x,  y)  =  1  —  4 xy  +  2e3xcos3 y  is  automatically  a  solution.  See  [61]  for  further  develop¬ 
ments. 


Exercises 


7.4.6.  Solve  the  following  homogeneous  linear  ordinary  differential  equations.  What  is  the 
dimension  of  the  solution  space?  (a)  u"  —  4u  =  0,  (b)  u"  —  6u  +  8u  =  0, 

(c)  u!"  —  9 u  =  0,  (d)  u""  +  4 u!"  —  u"  —  1 6u'  —  12 u  =  0. 

7.4.7.  Define  L[y]  =  y"  +  y.  (a)  Prove  directly  from  the  definition  that  L:  C2[a,  b]  C°[a,  b] 
is  a  linear  transformation,  (b)  Determine  kerL. 

7.4.8.  Answer  Exercise  7.4.7  when  L  =  3 D2  —  2D  —  5. 

7.4.9.  Consider  the  linear  differential  equation  y +  5  y "  +  3  y  —  9  y  =  0.  (a)  Write  the  equation 
in  the  form  L[y]  =  0  for  a  differential  operator  L  =  p(D).  (b)  Find  a  basis  for  kerL,  and 
then  write  out  the  general  solution  to  the  differential  equation. 


7.4.10.  The  following  functions  are  solutions  to  a  constant  coefficient  homogeneous  scalar 
ordinary  differential  equation,  (i)  Determine  the  least  possible  order  of  the  differential 
equation,  and  (ii)  write  down  an  appropriate  differential  equation. 

(a)  e2x  +  e~3x,  (b)l  +  e~x,  (c)  xex,  (d)  ex  +  2e2x  +  3e3x. 

q  ff  f 

7.4.11.  Solve  the  following  Euler  differential  equations:  (a)  x  u  +  5xu  —  5u  =  0, 

(b)  2x2  u"  —  xu  —  2u  =  0,  (c)  x2  u"  —  u  =  0,  (d)  x2  u"  +  xu  —  3^  =  0, 

(e)  3 x2  u"  —  5 xu  —  3 u  =  0,  (f)  ^  +  —  ^  =  0. 

d  ry*  Z  ry*  W  ry* 

\Aj*aj  \AJtAy 

7.4.12.  Solve  the  third  order  Euler  differential  equation  x3  u"  +  2x2  un  —  3xu'  +  3u  =  0  by 
using  the  power  ansatz  (7.56).  What  is  the  dimension  of  the  solution  space  for  x  >  0? 

For  all  x? 

2  d2 u  du 

— -w  +  bx  — — b  cu  =  0,  then 
dxz  dx 

v(t)  =  u^e1)  solves  a  linear,  constant  coefficient  differential  equation,  [ii)  Use  this 
alternative  technique  to  solve  the  Euler  differential  equations  in  Exercise  7.4.11. 


7.4.13.  (i)  Show  that  if  u(x)  solves  the  Euler  equation  ax 


0  7.4.14.  (a)  Use  the  method  in  Exercise  7.4.13  to  solve  an  Euler  equation  whose  characteristic 
equation  has  a  double  root  r1  =  r2  =  r.  (b)  Solve  the  specific  equations 


(0 


2  //  /  |  n 
xu  —xu  Jru  =  {). 


(ii) 


d2  u 


1 

+  - 


du 


fj  ry*  Z  ry*  d  ry* 


=  0. 


7.4.15.  Show  that  if  u[x)  solves  xu"  +  2 u  —  4 xu  =  0,  then  v[x)  =  xu[x)  solves  a 
linear,  constant  coefficient  equation.  Use  this  to  find  the  general  solution  to  the  given 
differential  equation.  Which  of  your  solutions  are  continuous  at  the  singular  point  x  =  0? 
Differentiable? 

7.4.16.  Let  S  C  R  be  an  open  subset  (i.e.,  a  union  of  open  intervals),  and  let  D :  C1(Sf)  — > 

C °[S)  be  the  derivative  operator  D[f]  =  f'.  True  or  false:  ker  D  is  a  one-dimensional 
subspace  of  C  1(S). 

2  2  X 

7.4.17.  Show  that  log(x  -\- y  )  and  — ^ - tt  are  harmonic  functions,  that  is,  solutions  of  the 

xz  +  y 

two-dimensional  Laplace  equation. 
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7.4.18.  Find  all  solutions  u  =  f(r)  of  the  two-dimensional  Laplace  equation  that  depend  only 

on  the  radial  coordinate  r  =  \jx 2  +  y2 .  Do  these  solutions  form  a  vector  space?  If  so,  what 
is  its  dimension? 


7.4.19.  Find  all  (real)  solutions  to  the  two-dimensional  Laplace  equation  of  the  form 

u  =  log p(x,y),  where  p(x,y)  is  a  quadratic  polynomial.  Do  these  solutions  form  a  vector 
space?  If  so,  what  is  its  dimension? 


T  7.4.20.  (a)  Show  that  the  function  ex  cos  y  is  a  solution  to  the  two-dimensional  Laplace 

equation,  (b)  Show  that  its  quadratic  Taylor  polynomial  at  x  =  y  =  0  is  harmonic, 
(c)  What  about  its  degree  3  Taylor  polynomial?  (d)  Can  you  state  a  general  theorem? 
(e)  Test  your  result  by  looking  at  the  Taylor  polynomials  of  the  harmonic  function 

log!"  (a;  -  l)2  +  y 2 


7.4.21.  (a)  Find  a  basis  for,  and  the  dimension  of,  the  vector  space  consisting  of  all  quadratic 
polynomial  solutions  of  the  three-dimensional  Laplace  equation  +  ^—77  +  ^—77  =  0. 


dx 2  dy 2 

(b)  Do  the  same  for  the  homogeneous  cubic  polynomial  solutions. 


dz 2 


7.4.22.  Find  all  solutions  u  =  f(r)  of  the  three-dimensional  Laplace  equation 
d2u  d2u 
dx 2  +  dy 2 

Do  these  solutions  form  a  vector  space?  If  so,  what  is  its  dimension? 


d  u  / - 

+  — —ft  =  0  that  depend  only  on  the  radial  coordinate  r  =  \  x2  +  y2  +  z 
dzz  v 


7.4.23.  Let  L,M  be  linear  functions,  (a)  Prove  that  ker (L°M)  D  ker  M.  (b)  Find  an 
example  in  which  ker(L  °  M)  7G  ker  M . 


Inhomogeneous  Systems 

Now  we  turn  our  attention  to  inhomogeneous  linear  systems 

L[  u]  =  f,  (7.60) 

where  L:  U  -T  V  is  a  linear  function,  f  E  V,  and  the  desired  solution  u  e  U.  Unless  f  =  0, 
the  set  of  solutions  to  (7.60)  is  not  a  subspace  of  [/,  but,  rather,  forms  an  affine  subspace, 
as  defined  in  Exercise  2.2.28.  Here,  the  crucial  question  is  existence  —  is  there  a  solution  to 
the  system?  In  contrast,  for  the  homogeneous  system  L[ z]  =  0,  existence  is  not  an  issue, 
since  0  is  always  a  solution.  The  key  question  for  homogeneous  systems  is  uniqueness: 
either  ker  L  —  {0},  in  which  case  0  is  the  only  solution,  or  kerL  7^  {0},  in  which  case  there 
are  infinitely  many  nontrivial  solutions  0^z  G  ker L. 

In  the  matrix  case,  the  compatibility  of  an  inhomogeneous  system  Ax.  =  b  —  which 
was  required  for  the  existence  of  a  solution  —  led  to  the  general  definition  of  the  image  of 
a  matrix,  which  we  copy  verbatim  for  linear  functions. 


Definition  7.37.  The  image  of  a  linear  function  L:  U  — V  is  the  subspace 


imgL  =  {L[u]  |  u  Gk}  C  V. 


The  proof  that  imgL  is  a  subspace  of  the  codomain  is  straightforward:  If  f  =  L[ u] 
and  g  =  L[v]  are  any  two  elements  of  the  image,  so  is  any  linear  combination,  since,  by 
linearity 


cf  +  dg 


cL[u]+dL[v]  =  L[cu  +  dv]  E  imgL. 


For  example,  if  L[u 
is  the  subspace  imgL 


Hu  is  given  by  multiplication  by  an  m  x  n  matrix,  then  its  image 
=  imgH  C  spanned  by  the  columns  of  A  —  the  column  space 
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of  the  coefficient  matrix.  When  L  is  a  linear  differential  operator,  or  more  general  linear 
operator,  characterizing  its  image  can  be  a  much  more  challenging  problem. 

The  fundamental  theorem  regarding  solutions  to  inhomogeneous  linear  equations  exactly 
mimics  our  earlier  result,  Theorem  2.39,  for  matrix  systems. 


Theorem  7.38.  Let  L:U  V  be  a  linear  function.  Let  f  G  V.  Then  the  inhomogeneous 
linear  system 


(7.61) 


has  a  solution  if  and  only  if  f  G  img  L.  In  this  case,  the  general  solution  to  the  system  has 
the  form 

u  =  u*  +  z,  (7.62) 

where  u*  is  a  particular  solution,  so  L[u*]  =  f,  and  z  is  any  element  of  kerL,  i.e.,  a  solution 
to  the  corresponding  homogeneous  system 


L[z]  =  0. 


(7.63) 


Proof :  We  merely  repeat  the  proof  of  Theorem  2.39.  The  existence  condition  f  G  imgL 
is  an  immediate  consequence  of  the  definition  of  the  image.  Suppose  u*  is  a  particular 
solution  to  (7.61).  If  z  is  a  solution  to  (7.63),  then,  by  linearity, 


L[  u*  +  z]  =  L[u*]  +  L[z]  =  f  +  0  =  f , 


and  hence  u*  +  z  is  also  a  solution  to  (7.61).  To  show  that  every  solution  has  this  form, 
let  u  be  a  second  solution,  so  that  L[u]  =  f.  Setting  z  =  u  —  u*,  we  find  that 


L[zl  =  L 


u  —  u 


=  L[  u]  —  L[u*]  =  f  —  f  =  0 


Therefore  z  G  kerL,  and  so  u  has  the  proper  form  (7.62) 


Q.E.D. 


Corollary  7.39.  The  inhomogeneous  linear  system  (7.61)  has  a  unique  solution  if  and 
only  if  f  G  img  L  and  ker  L  —  {0}. 


Therefore,  to  prove  that  a  linear  system  has  a  unique  solution,  we  first  need  to  prove  an 
existence  result  that  there  is  at  least  one  solution,  which  requires  the  right-hand  side  f  to  he 
in  the  image  of  the  operator  L,  and  then  a  uniqueness  result ,  that  the  only  solution  to  the 
homogeneous  system  L[z]  =  0  is  the  trivial  zero  solution  z  =  0.  Observe  that  whenever  an 
inhomogeneous  system  L[u]  =  f  has  a  unique  solution,  then  every  other  inhomogeneous 
system  L[u]  =  g  that  is  defined  by  the  same  linear  function  also  has  a  unique  solution, 
provided  g  G  img  L.  In  other  words,  uniqueness  does  not  depend  upon  the  external  forcing 
—  although  existence  might. 


Remark.  In  physical  systems,  the  inhomogeneity  f  typically  corresponds  to  an  external 
force.  The  decomposition  formula  (7.62)  states  that  its  effect  on  the  linear  system  can 
be  viewed  as  a  combination  of  one  specific  response  u*  to  the  forcing  and  the  system’s 
internal,  unencumbered  motion,  as  represented  by  the  homogeneous  solution  z.  Keep  in 
mind  that  the  particular  solution  is  not  uniquely  defined  (unless  kerL  =  {0}),  and  any 
one  solution  can  serve  in  this  role. 


Example  7.40. 


Consider  the  inhomogeneous  linear  second  order  differential  equation 

u"  +  u!  —  2u  =  x.  (7.64) 
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Note  that  this  can  be  written  in  the  linear  system  form 


L 


u 


—  X , 


where 


L  =  D2  +  D 


is  a  linear  second  order  differential  operator.  The  kernel  of  the  differential  operator  L  is 
found  by  solving  the  associated  homogeneous  linear  equation 


L 


z 


=  z"  +  z  —  2z  —  0. 


(7.65) 


Applying  the  usual  solution  method,  we  find  that  the  homogeneous  differential  equation 
(7.65)  has  a  two-dimensional  solution  space,  with  basis  functions 


z1(x)  =  e 


2  x 


z2(x)  =  ex. 


Therefore,  the  general  element  of  ker  L  is  a  linear  combination 


z(pc)  =  cx  zx (x)  +  c2 z2 (x)  —  c1e  2x  +  c2ex 

To  find  a  particular  solution  to  the  inhomogeneous  differential  equation  (7.64),  we  rely 
on  the  method  of  undetermined  coefficients^ .  We  introduce  the  solution  ansatz  u  —  ax  +  6, 
and  compute 


L[u\  =  L[ax  +  b]  =  a  —  2(ax  -\-  b)  =  —  2ax  +  (a  —  2b)  =  x. 

Equating  the  coefficients  of  x  and  1,  and  then  solving  for  a  —  —  b  —  - 
that 


j,  we  deduce 


u*(x)  —  —  \x  —  \ 

is  a  particular  solution  to  the  inhomogeneous  differential  equation.  Theorem  7.38  then  says 
that  the  general  solution  is 


u(x)  =  u*(x)  +  z(x)  =  —  \  x  —  \  +  cle 


~2x  -hc9ex. 


Example  7.41.  By  inspection,  we  see  that 


u(x,  y)  =  -  \  sin(x  +  y) 


is  a  solution  to  the  particular  Poisson  equation 


d2u  d2u  . 

+  a?  = sm(x  +  y) 


(7.66) 


Theorem  7.38  implies  that  every  solution  to  this  inhomogeneous  version  of  the  Laplace 
equation  (7.59)  takes  the  form 

u(x,  y)  =  -\  sin(a;  +  y)  +  z{x,  y ), 

where  z(x,  y)  is  an  arbitrary  harmonic  function,  i.e.,  a  solution  to  the  homogeneous  Laplace 
equation. 


t  One  could  also  employ  the  method  of  variation  of  parameters,  although  usually  the  undeter¬ 
mined  coefficient  method,  when  applicable,  is  the  more  straightforward  of  the  two.  Details  can  be 
found  in  most  ordinary  differential  equations  texts,  including  [7,22]. 
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Example  7.42.  Let  us  solve  the  second  order  linear  boundary  value  problem 


// 


U  +  U  —  X, 


?i(0)  =  0, 


u(  7f)  =  0, 


(7.67) 


As  with  initial  value  problems,  the  hrst  step  is  to  solve  the  differential  equation.  To  this 
end,  we  first  solve  the  corresponding  homogeneous  differential  equation  z"  +  z  =  0.  The 
usual  method  —  see  [7]  or  Example  7.50  below  —  shows  that  cos  x  and  sin  x  form  a  basis  for 
its  solution  space.  The  method  of  undetermined  coefficients  then  produces  the  particular 
solution  u*(x)  =  x  to  the  inhomogeneous  differential  equation,  and  so  its  general  solution 
is 


u(x)  =  x  +  cx  cos  x  +  c2  sin  x. 


(7.68) 


The  next  step  is  to  see  whether  any  solutions  also  satisfy  the  boundary  conditions.  Plugging 
formula  (7.68)  into  the  boundary  conditions  yields 


u(0)  —  cx—  0, 


u(  7f)  =  7T  —  C1  =  0. 


However,  these  two  conditions  are  incompatible,  and  so  there  is  no  solution  to  the  linear 
system  (7.67).  The  function  f(x)  =  x  does  not  he  in  the  image  of  the  differential  operator 
L[u]  —  u"  +  u  when  u  is  subjected  to  the  boundary  conditions.  Or,  to  state  it  another  way, 

T  T 

( x,  0,  0 )  does  not  belong  to  the  image  of  the  linear  operator  M[u]  —  (u  +  u,  u(  0),  u(jr) ) 
defining  the  boundary  value  problem. 

On  the  other  hand,  if  we  slightly  modify  the  inhomogeneity,  the  boundary  value  problem 


u 


// 


+  U  =  X  —  ^  7T, 


n(0)  =  0, 


u(  7f)  =  0, 


(7.69) 


does  admit  a  solution,  but  it  fails  to  be  unique.  Applying  the  preceding  solution  techniques, 
we  find  that 

u(x)  =  x  —  ^7r+^7r  cos  x  +  c  sin  x 

solves  the  system  for  any  choice  of  constant  c,  and  so  the  boundary  value  problem  (7.69) 
admits  infinitely  many  solutions.  Observe  that  z{x)  =  sinx  is  a  basis  for  the  kernel  or 
solution  space  of  the  corresponding  homogeneous  boundary  value  problem 


z"  +  z  =  0,  z(0)  =  0,  z(  7r)  =  0, 

while  u*(x)  =  x  —  \  tv  cos  x  represents  a  particular  solution  to  the  inhomogeneous 

system.  Thus,  u(x)  =  u*(x)  +  z(x),  in  conformity  with  the  general  formula  (7.62). 
Incidentally,  if  we  modify  the  interval  of  definition,  considering 

u"  +  u  =  f(x ),  u(0)  =  0,  7r)  =  0,  (7.70) 

then  the  homogeneous  boundary  value  problem,  with  f(x)  =  0,  has  only  the  trivial  solution, 
and  so  the  inhomogeneous  system  admits  a  unique  solution  for  any  inhomogeneity  f(x). 
For  example,  if  f{x)  —  x,  then 

u{x)  —  x  —  ^  7rsinx  (7.71) 

is  the  unique  solution  to  the  resulting  boundary  value  problem. 


This  example  highlights  some  crucial  differences  between  boundary  value  problems  and 
initial  value  problems  for  ordinary  differential  equations.  Nonsingular  initial  value  problems 
have  a  unique  solution  for  every  suitable  set  of  initial  conditions.  Boundary  value  problems 
have  more  of  the  flavor  of  linear  algebraic  systems,  either  possessing  a  unique  solution  for 
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all  possible  inhomogeneities,  or  admitting  either  no  solution  or  infinitely  many  solutions, 
depending  on  the  right-hand  side.  An  interesting  question  is  how  to  characterize  the 
inhomogeneities  f(x)  that  admit  a  solution,  i.e.,  that  he  in  the  image  of  the  associated 
linear  operator.  These  issues  are  explored  in  depth  in  [61]. 


Exercises 


7.4.24.  For  each  of  the  following  inhomogeneous  systems,  determine  whether  the  right-hand 
side  lies  in  the  image  of  the  coefficient  matrix,  and,  if  so,  write  out  the  general  solution, 
clearly  identifying  the  particular  solution  and  the  kernel  element. 

/I  2 

/  ’  9  I  /I  \  /  I  \ 

O) 


(a) 


1  -1 
3  -3 


X 


1 

2 


2 

-1 


1 

2 


4 

1 


x 


1 

2 


(c) 


2 

VI 


o 
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-1\ 
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(  —2 

i\ 

(  1\ 

(  ~l 

3 

0 

2  \ 

( 

2  \ 

(d) 

-2 

3 

X  = 

0 

,  (e) 

2 

-6 

1 

-1 

x  = 

-2 

V  3 

-5J 

yi/ 

V-3 

9 

-2 

\ 

2  ) 

7.4.25.  Which  of  the  following  systems  have  a  unique  solution? 


(a) 


(*>) 


1 

2 


2 

3 


1 

2 


(2  1  -1\ 

(  u  ? 

(  3^i 

(1  4  -1\ 

f  u\ 

(- 2\ 

(c) 

0-3-3 

V 

— 

-1 

,  (d) 

1  3  -3 

v\  = 

-1 

V2  0-2 ) 

\wj 

V2  3  —2  J  \w  J 

^  1/ 

(a)  u  —  4 u  =  x  —  3,  (b)  5u"  —  4 u  +  4 u  =  ex 


cos  x . 


\ 

(  0\ 

X  = 

3 

) 

V  3  / 

7.4.26.  Solve  the  following  inhomogeneous  linear  ordinary  differential  equations: 


(c)  u"  —  3u  =  e 


3  x 


7.4.27.  Solve  the  following  initial  value  problems:  (a)  u  -\-3u  =  ex,  u(  1)  =  0,  (b)  u'-\-4u  =  1, 
u(tt)  =  u  (jr)  =  0,  (c)  u"  —  u  —  2u  =  ex  +  e-x,  a(0)  =  i/(0)  =  0,  (d)  u"  -\-2u  +  5^  =  sin x, 
a(0)  =  1,  1/(0)  =  0,  (e)  u "  —  u"  +  u  —  u  =  x,  a(0)  =  0,  1/(0)  =  1,  u'(0)  =  0. 

7.4.28.  Solve  the  following  inhomogeneous  Euler  equations  using  either  variation  of  parameters 
or  the  change  of  variables  method  discussed  in  Exercise  7.4.13: 

(a)  x2  u'  +  xu  —  u  =  x,  (b)  x2  u'  —  2xu  +  2i£  =  log  x,  (c)  x2  u"  —  3xu  —  5u  =  3x  —  5. 


7.4.29.  Write  down  all  solutions  to  the  following  boundary  value  problems.  Label  your  answer 
as  ( i )  unique  solution,  (n)  no  solution,  (Hi)  infinitely  many  solutions. 

(a)  un  +  2 u  =  2x,  a(0)  =  0,  u(tt)  =  0,  (b)  u  +  4i£  =  cosx,  u(—  tt)  =  0,  i^(7r)  =  1, 

(c)  u"  —  2 u  +  u  =  x  —  2,  i^(0)  =  —1,  i^(l)  =  1, 

(d)  u"  +  2u  +  2a  =  1,  a(0)  =  a(7r)  =  (e)  u"  —  3 u  +2 a  =  4x,  a(0)  =  0,  a(l)  =  0, 

(f)  x2  u"  -f  xu  —  a  =  0,  a(0)  =  1,  a(l)  =  0,  (g)  x2  u"  —  6a  =  0,  a(l)  =  1,  a(2)  =  —1, 

(h)  x2  u"  —  2xu  +  2a  =  0,  a(0)  =  0,  a(l)  =  1. 

0  7.4.30.  Let  L:U  — k  be  a  linear  function,  and  let  W  C  C7  be  a  subspace  of  the  domain 
space,  (a)  Prove  that  Y  =  {  L[w]  |  w  G  W  }  C  imgL  C  V  is  a  subspace  of  the  image. 

(b)  Prove  that  dimT  <  dimVF.  Conclude  that  a  linear  transformation  can  never  increase 
the  dimension  of  a  subspace. 


0  7.4.31.  (a)  Show  that  if  L:  V  — >  V  is  linear  and  kerL  7^  {0},  then  L  is  not  invertible. 

(b)  Show  that  if  imgL  7^  V,  then  L  is  not  invertible. 

(c)  Give  an  example  of  a  linear  map  with  kerL  =  {0}  that  is  not  invertible.  Hint :  First 
explain  why  your  example  must  be  on  an  infinite-dimensional  vector  space. 
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Superposition  Principles  for  Inhomogeneous  Systems 

The  superposition  principle  for  inhomogeneous  linear  systems  allows  us  to  combine  different 
inhomogeneities  —  provided  that  we  do  not  change  the  underlying  linear  operator.  The 
result  is  a  straightforward  generalization  of  the  matrix  version  described  in  Theorem  2.44. 


Theorem  7.43.  Let  L:U  — >•  V  be  a  linear  function.  Suppose  that,  for  each  i  =  1, . . . ,  &, 
we  know  a  particular  solution  u*  to  the  inhomogeneous  linear  system  L[u]  =  fi  for  some 
f?  E  imgL.  Then,  given  scalars  c1? . . . ,  cfc,  a  particular  solution  to  the  combined  inhomo¬ 
geneous  system 

L[u]  =c1f1+  •••  +  ck  fk  (7.72) 

is  the  corresponding  linear  combination 


iT  =  cx  u^  4 - b  ch  u* 


'k 


(7.73) 


of  particular  solutions.  The  general  solution  to  the  inhomogeneous  system  (7.72)  is 


u  =  u*  +  z  =  c1  u  ^  + 


+  Ck  Ufc  +  Zb 


(7.74) 


where  z  E  kerL  is  an  arbitrary  solution  to  the  associated  homogeneous  system  L[z]  =  0. 


The  proof  is  an  easy  consequence  of  linearity,  and  left  to  the  reader.  In  physical  terms, 
the  superposition  principle  can  be  interpreted  as  follows.  If  we  know  the  response  of  a 
linear  physical  system  to  several  different  external  forces,  represented  by  f1? . . . ,  fk,  then 
the  response  of  the  system  to  a  linear  combination  of  these  forces  is  just  the  self-same  linear 
combination  of  the  individual  responses.  The  homogeneous  solution  z  represents  an  inter¬ 
nal  motion  that  the  system  acquires  independent  of  any  external  forcing.  Superposition 
relies  on  the  linearity  of  the  system,  and  so  is  always  applicable  in  quantum  mechanics, 
which  is  an  inherently  linear  theory.  On  the  other  hand,  in  classical  and  relativistic  me¬ 
chanics,  superposition  is  valid  only  in  the  linear  approximation  regime  governing  small 
motions/displacements/etc.  Large-scale  motions  of  a  fully  nonlinear  physical  system  are 
more  subtle,  and  combinations  of  external  forces  may  lead  to  unexpected  results. 

Example  7.44.  In  Example  7.42,  we  found  that  a  particular  solution  to  the  linear  dif¬ 
ferential  equation 

u"  +  u  =  x  is  u\  —  x. 

The  method  of  undetermined  coefficients  can  be  used  to  solve  the  inhomogeneous  equation 

u"  +  u  —  cos  x. 


Since  cos  x  and  sin  x  are  already  solutions  to  the  homogeneous  equation,  we  must  use  the 
solution  ansatz 

u  —  axcosx  -b  bx  sinx, 

which,  when  substituted  into  the  differential  equation,  produces  the  particular  solution 

u2  =  —  \x  sinx. 

Therefore,  by  the  superposition  principle,  the  combined  inhomogeneous  system 

u"  +  u  —  3  x  —  2  cos  x 


has  a  particular  solution 


u*  —  3  u\  —  2  u\  =  3  .x  T  x  sin  x. 
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The  general  solution  is  obtained  by  appending  an  arbitrary  solution  to  the  homogeneous 
equation: 

u  =  3  x  +  x  sin  x  +  cx  cos  x  +  c2  sin  x. 


Example  7.45.  Consider  the  boundary  value  problem 

u"  +  u  =  x,  u( 0)  =  2,  u(  7,  7T )  =  — 1,  (7.75) 

which  is  a  modification  of  (7.70)  with  inhomogeneous  boundary  conditions.  The  superposi¬ 
tion  principle  applies  here,  and  allows  us  to  decouple  the  inhomogeneity  due  to  the  forcing 
from  the  inhomogeneity  due  to  the  boundary  conditions.  We  decompose  the  right-hand 
side,  written  in  vectorial  form,  into  simpler  constituents^ 


The  first  vector  on  the  right-hand  side  corresponds  to  the  preceding  boundary  value  prob¬ 
lem  (7.70),  whose  solution  was  found  in  (7.71).  The  second  and  third  vectors  correspond 
to  the  unforced  boundary  value  problems 

u"  +  u  =  0,  ii(0)  =  1,  u^\  7r)=0,  and  u" -\- u  =  0,  u(  0)  =  0,  7r )  =  1, 

with  respective  solutions  u(x)  =  cosx  and  u(x)  —  sinx.  Therefore,  the  solution  to  the 
combined  boundary  value  problem  (7.75)  is  the  same  linear  combination  of  these  individual 
solutions: 


u(x)  =  ( x  —  \  7T  sin  x )  +  2  cos  x  —  sin  x  —  x  +  2  cos  x  —  ( 1  +  ^  7r )  sin  x. 

The  solution  is  unique  because  the  corresponding  homogeneous  boundary  value  problem 

z"  +  Z  —  0,  z(0)  =  0,  z(t}  7T  )  =  0, 

has  only  the  trivial  solution  z(x)  =  0,  as  you  can  verify. 


Exercises 

7.4.32.  Use  superposition  to  solve  the  following  inhomogeneous  ordinary  differential  equations: 
(a)  u  +  2u  =  1  +  cosx,  (b)  u  —  9 u  =  x  +  sinx,  (c)  9 u  —  1 8u'  +  10 u  =  1  +  ex  cosx, 
(d)  u"  +  u  —  2u  =  sinhx,  where  sinh  x  =  \  (ex  —  e~x),  (e)  u'"  +  9u'  =  1  +  e3x. 

7.4.33.  Consider  the  differential  equation  un  +  xu  =  2.  Suppose  you  know  solutions  to  the  two 
boundary  value  problems  u(0)  =  1,  u(  1)  =  0  and  u( 0)  =  0,  u(  1)  =  1.  List  all  possible 
boundary  value  problems  you  can  solve  using  superposition. 


^  Warning.  When  writing  out  a  linear  combination,  make  sure  the  scalars  are  constants'. 
Writing  the  first  summand  as  x  ( 1,  0,  0  )T  will  lead  to  an  incorrect  application  of  the  superposition 
principle. 
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7.4.34.  Consider  the  differential  equation  xu"  —  (x  +  l)u  +  u  =  0.  Suppose  we  know  the 

solution  to  the  initial  value  problem  u(  1)  =  2,  u  (1)  =  1  is  u(x)  =  x-\- 1,  while  the  solution 
to  the  initial  value  problem  u(  1)  =  1,  i/(l)  =  1  is  u(x)  =  ex_1.  (a)  What  is  the  solution 
to  the  initial  value  problem  u(  1)  =  3,  u  ( 1)  =  —  2?  (b)  What  is  the  general  solution  to  the 
differential  equation? 


7.4.35.  Consider  the  differential  equation  4 xun  +  2 u  +  u  =  0.  Given  that  cos  yTc  solves  the 
boundary  value  problem  \n2  )  =0,  u(n2)  =  —  1,  and  sin  ^[x  solves  the  boundary  value 

problem  \  iPJ  )  =  1,  u( 7i2)  =  0,  write  down  the  solution  to  the  boundary  value  problem 
1 7T2  )  =  —3,  ^(tt2)  =  7. 


7.4.36.  Solve  the  following  boundary  value  problems  by  using  superposition:  (a)  u"  +  9u  =  x, 
a(0)  =  1,  u(tt)  =  0,  (b)  un  —  8 it7  +  16a  =  e4x,  a(0)  =  1,  u(l)  =  0,  (c)  u"  +  4a  =  sin3x, 
{/(O)  =  0,  a(27r)  =  3,  (d)  u"  —  2u  +  u  =  1  +  ex,  i/(0)  =  —1,  r/(l)  =  1. 


2  2 

7.4.37.  Given  that  x  -\-  y  solves  the  Poisson  equation 

d2u  d2u  ,  2  2\  .  n  -  . 

+  ~q~2  =  12 (x  +  y  ),  write  down  a  solution  to 


d2a 


+ 


+ 


d2u 

dy2 

d2u 


dx2  dy 


4,  while  x4  +  y4  solves 

ii  2.2 

1  +  x  +  y  . 


C  7.4.38.  Reduction  of  order :  Suppose  you  know  one  solution  tt1(x)  to  the  second  order 

homogeneous  differential  equation  u"  +  a(x)u  +  b(x)u  =  0.  (a)  Show  that  if  u(x)  = 
x(x)a1(x)  is  any  other  solution,  then  w(x)  =  v  (x)  satisfies  a  first  order  differential 
equation,  (b)  Use  reduction  of  order  to  find  the  general  solution  to  the  following  equations, 
based  on  the  indicated  solution: 

(i)  u  —  2 u  +  u  =  0,  a1(x)  =  ex ,  (ii)  xu"  +  (x  —  1  )u  —  u  =  0,  u^(x)  =  x  —  1, 

(m)  u"  -\-  Axu  -\-  (4x2  +  2)  a  =  0,  a1  (x)  =  e  x  ,  (iv)  u"  —  (x2  +  1)  a  =  0,  a1  (x)  =  ex  ^2. 

0  7.4.39.  Write  out  the  details  of  the  proof  of  Theorem  7.43. 


Complex  Solutions  to  Real  Systems 

As  we  know,  solutions  to  a  linear,  homogeneous,  constant  coefficient  ordinary  differential 
equation  are  found  by  substituting  an  exponential  ansatz,  which  effectively  reduces  the 
differential  equation  to  the  polynomial  characteristic  equation.  Complex  roots  of  the  char¬ 
acteristic  equation  yield  complex  exponential  solutions.  But,  if  the  equation  is  real,  then 
the  real  and  imaginary  parts  of  the  complex  solutions  are  automatically  real  solutions.  This 
solution  technique  is  a  particular  case  of  a  general  principle  for  producing  real  solutions 
to  real  linear  systems  from,  typically,  simpler  complex  solutions.  To  work,  the  method 
requires  us  to  impose  some  additional  structure  on  the  complex  vector  spaces  involved. 

Definition  7.46.  A  complex  vector  space  V  is  called  conjugated  if  it  admits  an  operation 
of  complex  conjugation  taking  u  E  V  to  u  E  V  with  the  following  properties: 

(a)  conjugating  twice  returns  one  to  the  original  vector:  u  =  u; 

(b)  compatibility  with  vector  addition:  u  +  v  =  u  +  v  for  all  u,  v  E  V; 

(c)  compatibility  with  scalar  multiplication,  A  u  =  A  u,  for  all  A  E  C  and  u  E  V. 

The  simplest  example  of  a  conjugated  vector  space  is  Cn.  The  complex  conjugate 
of  a  vector  u  =  ( a2, . . . ,  un  )T  is  obtained  by  conjugating  all  its  entries,  whereby 

u=  (u1,u2,...,unf.  Thus, 
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u  =  v  +  i  w,  u  +  n  T  U-U 

where  v  =  Re  u  =  - ,  w  =  Im  u  —  ; — 

u  =  v  —  i  w,  2  2  i 

are  the  real  and  imaginary  parts  of  u  G  Cn.  For  example,  if 


(7.76) 


u  = 


whence 


then 


5 


The  other  prototypical  example  of  a  conjugated  vector  space  is  the  space  of  complex¬ 
valued  functions  f(x)  =  r(x)  +  is(x)  defined  on  the  interval  a  <  x  <  b.  The  complex 
conjugate  function  is  f(x)  =  f(x)  =  r(x)  —  i s(x).  Thus,  the  complex  conjugate  of 
^(1+3  i  )x  _  ex  COs3x  +  iexsin3x  is  e^1+3l)x  =  e(1_3l)x  —  ex  cos3x  —  iexsin3x,  with 


Re  e^1+3  1  —  ex  cos  3  x,  Im  e^1+3  1  ^x  —  iex  sin  3  x. 

An  element  v  G  V  of  a  conjugated  vector  space  is  called  real  if  v  =  v.  One  easily  checks 
that  the  real  and  imaginary  parts  of  a  general  element,  as  defined  by  (7.76),  are  both  real 
elements. 


Warning.  Not  all  subspaces  of  a  conjugated  vector  space  are  conjugated.  For  example, 

rj~i 

the  one-dimensional  subspace  of  C2  spanned  by  v:  =  (1,2)  is  conjugated.  Indeed,  the 

T  T 

complex  conjugate  of  a  general  element  cv1  =  (c,  2c)  is  (c,  2  c)  =  cv1  which  also 
belongs  to  the  subspace.  On  the  other  hand,  the  subspace  spanned  by  ( 1,  i  )  is  not 
conjugated,  because  the  complex  conjugate  of  the  element  (c,  i  c)T  is  (c,  —  ic)T,  which 
does  not  belong  to  the  subspace  unless  c  —  0.  In  Exercise  7.4.50  you  are  asked  to  prove 
that  a  subspace  V  C  Cn  is  conjugated  if  and  only  if  it  has  a  basis  v1? . . . ,  vfc  consisting 
entirely  of  real  vectors.  While  conjugated  subspaces  play  a  role  in  certain  applications,  in 
practice  we  will  deal  only  with  Cn  and  the  entire  space  of  complex- valued  functions,  and 
so  can  suppress  most  of  these  somewhat  technical  details. 


Definition  7.47.  A  linear  function  L:U  V  between  conjugated  vector  spaces  is  called 
real  if  it  commutes  with  complex  conjugation: 


(7.77) 


For  example,  any  linear  function  L:Cn  — »>  Cm  is  given  by  multiplication  by  an  m  x  n 
matrix:  L[u]  =  Aw.  The  function  is  real  if  and  only  if  A  is  a  real  matrix.  Similarly,  a 
differential  operator  (7.15)  is  real  if  and  only  if  its  coefficients  are  real- valued  functions. 

The  solutions  to  a  homogeneous  system  defined  by  a  real  linear  function  satisfy  the 
following  general  Reality  Principle. 


Theorem  7.48.  If  L[u]  =  0  is  a  real  homogeneous  linear  system  and  u  =  v  +  iw  is  a 
complex  solution,  then  its  complex  conjugate  U  =  v  —  iw  is  also  a  solution.  Moreover, 
both  the  real  and  imaginary  parts,  v  and  w,  of  a  complex  solution  are  real  solutions. 


Proof :  First  note  that,  by  reality,  L[ u]  =  L[ u]  =  0  whenever  L[ u]  =  0,  and  hence  the 
complex  conjugate  u  of  any  solution  is  also  a  solution.  Therefore,  by  linear  superposition, 

1  _  1  _ 

v  =  Re  u  =  -(u  +  u)  and  w  =  Im  u  =  —  (u  —  u)  are  also  solutions.  Q.E.D. 

2  2  i 
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Example  7.49.  The  real  linear  matrix  system 


2-130 
-2  112 


has  a  complex  solution 


fx\ 

y 

Z 

\w  ) 


0 

0 


u  = 


/-f-six 

f “3\ 

1 

1 

i  ; 

0 

1  +  2i 

1 

i  1 

2 

V  —  2  —  4  i  / 

\  — 2/ 

\  — 4/ 

Since  the  coefficient  matrix  is  real,  the  real  and  imaginary  parts, 

v  =  (—1,1, 1,-2  f,  w  =  (  —3, 0, 2,  —4  )T , 

are  both  solutions  of  the  system,  as  can  easily  be  checked. 

On  the  other  hand,  the  complex  linear  system 

(x\ 


2  —2  i 

1  +  i  0 


i  0 
2  -  i  1 


y 

z 

\w  ) 


0 

0 


has  the  complex  solution 


u  — 


z1-1  \ 

—  i 
2 

\  2  +  2  i  / 


0 
2 

\2  / 


+  i 


(~l\ 

-1 

0 

2/ 


However,  neither  its  real  nor  its  imaginary  part  is  a  solution  to  the  system. 

Example  7.50.  Consider  the  real  homogeneous  ordinary  differential  equation 

u"  +  2v!  +  5u  =  0. 

To  solve  it,  we  use  the  usual  exponential  ansatz  u  =  eXx,  leading  to  the  characteristic 
equation 

A2  +  2  A  +  5  =  0. 

There  are  two  roots, 

=  —  1  +  2  i ,  A2  =  —  1  —  2  i , 
leading,  via  Euler’s  formula  (3.92),  to  the  complex  solutions 

u1(x)  —  e(-1+2l)x  —  e~x  cos2x  +  ie~x  sin2x, 

u2(x)  =  e(-1-2l)x  =  e~x  cos2x  —  ie~x  sin2x. 

The  complex  conjugate  of  the  first  solution  is  the  second,  in  accordance  with  Theorem  7.48. 
Moreover,  the  real  and  imaginary  parts  of  the  two  solutions 


v(x)  —  e  x  cos  2x, 


w{x)  —  e  x  sin  2x, 


are  individual  real  solutions.  Since  the  solution  space  is  two-dimensional,  the  general 
solution  is  a  linear  combination 

u(x)  —  cx  e~ x  cos  2  x  +  c2e~x  sin  2  x, 

of  the  two  linearly  independent  real  solutions. 
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x2  u"  +  7xu'  +  13  u  =  0. 


The  roots  of  the  associated  characteristic  equation 

r(r  —  1)  +  7r  +  13  =  r2  +  6r  +  13  =  0 

are  complex:  r  =  —  3  =b  2  i ,  and  the  resulting  solutions  x~ 3+2 1 ,  x~  3-21  are  complex  con¬ 
jugate  powers.  We  use  Euler’s  formula  (3.92),  to  obtain  their  real  and  imaginary  parts: 

x-3+2i  _  x-3  g2i  logx  _  x-3  cos^2  log  x)  +  ix~3  sin(2  log  x), 

valid  for  x  >  0.  (For  x  <  0  just  replace  x  by  —  x  in  the  above  formula.)  Again,  by 
Theorem  7.48,  the  real  and  imaginary  parts  of  the  complex  solution  are  by  themselves  real 
solutions  to  the  equation.  Therefore,  the  general  real  solution  to  this  differential  equation 
for  x  >  0  is 

u(x)  —  c1  x~ 3  cos(2  log  x)  +  c2  x~ 3  sin(2  log  x) . 


Example  7.52.  The  complex  monomial 


u(x,y)  =  (z+  i  y) 


n 


is  a  solution  to  the  Laplace  equation 


d2u  d2u 
dx2  +  dy 2 


because,  by  the  chain  rule, 


r)2 1 1  c)2  ?/ 

— —  =  n{n  -  l)(x  +  i y)n~2,  ttw  =  n{n  -  1)  i2  (x  +  i y)n~2  =  —n{n  -  l)(x  +  iy)n~2. 
oxz  oyz 

Since  the  Laplacian  operator  is  real,  Theorem  7.48  implies  that  the  real  and  imaginary  parts 
of  this  complex  solution  are  real  solutions.  The  resulting  real  solutions  are  the  harmonic 
polynomials  introduced  in  Example  7.36. 

Knowing  this,  it  is  relatively  easy  to  find  the  explicit  formulas  for  the  harmonic  poly¬ 
nomials.  We  appeal  to  the  binomial  formula 


(a  +  &)"  =  £  (fc)*V"fc 
i  =  0  ^  ' 


where 


n 


n\ 


kj  k  \  (n  —  k)\ 


(7.78) 


is  the  standard  notation  for  the  binomial  coefficients.  Since  i 2  =  —  1,  i3  =  —  i,  i4  =  1 , 
etc.,  we  have 


(x  +  iy)n  =  xn  +  nxn  1(iy)+[™jxn  2(i  y)2  +  y™)  xn  3(i  y)3  +  •••  +(i  y) 


n 


—  xn  +  i  n  x 


n  —  1 


n 


y 


xn  2  y2 


n 

U3]X 


n — 3  „  .3 


y  + 


Separating  the  real  and  imaginary  terms,  we  obtain  the  explicit  formulas 


n 


Re  (x  +  i  y)  n  =  xn  —  [  i  x 


n— 2  „  .2 


n 


y  +  i  4  1  x 


n— 4  „  .4 


y  + 


Im  {pc  +  i y)n  =  nx 


n  —  1 


n 


y 


n— 3  „  .3 


n 


x'°  ~  y  +  (  |  x 


n- 5  ,.5 


y  + 


(7.79) 
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for  the  two  independent  harmonic  polynomials  of  degree  n.  The  first  few  of  these  poly¬ 
nomials  were  described  in  Example  7.36.  In  fact,  it  can  be  proved  that  the  most  general 
solution  to  the  Laplace  equation  can  be  written  as  a  convergent  infinite  series  in  the  basic 
harmonic  polynomials,  cf.  [61]. 


Exercises 


7.4.40.  Can  you  find  a  complex  matrix  A  such  that  kerT  /  {0}  and  the  real  and  imaginary 
parts  of  every  complex  solution  to  Au  =  0  are  also  solutions? 

7.4.41.  Find  the  general  real  solution  to  the  following  homogeneous  differential  equations: 

(a)  u"  +  4 u  =  0,  (b)  u"  +  6u'  +  10 u  =  0,  (c)  2 u"  +  3 u  —  5u  =  0,  (d)  u"’  +  u  =  0, 
(e)  urn  +  1 3u"  +  36a  =  0,  (f )  x2  u"  —  xu  +  3a  =  0,  (g)  x3  u"  +  x 2  u"  +  3xu'  —  8a  =  0. 

7.4.42.  The  following  functions  are  solutions  to  a  real  constant  coefficient  homogeneous  scalar 
ordinary  differential  equation,  (i)  Determine  the  least  possible  order  of  the  differential 

equation,  and  (ii)  write  down  an  appropriate  differential  equation.  (a)  e_xsin3x, 

( b )  xsinx,  (c)  1  +  xe~x  cos 2x,  (d)  sinx  +  cos2x,  (e)  sinx  +  x2cosx. 

7.4.43.  Find  the  general  solution  to  the  following  complex  ordinary  differential  equations. 
Verify  that,  in  these  cases,  the  real  and  imaginary  parts  of  a  complex  solution  are  not  real 

solutions,  (a)  u  +  iu  =  0,  (b)  u"  —  i u  +  ( i  —  1) u  =  0,  (c)  u"  —  iu  =  0. 

7.4.44.  (a)  Write  down  the  explicit  formulas  for  the  harmonic  polynomials  of  degree  4  and 
check  that  they  are  indeed  solutions  to  the  Laplace  equation,  (b)  Prove  that  every 
homogeneous  polynomial  solution  of  degree  4  is  a  linear  combination  of  the  two  basic 
harmonic  polynomials. 

7.4.45.  Find  all  complex  exponential  solutions  u(t,x)  =  eujt+kx  of  the  beam  equation 


dzu  d^u 


dt2  dx4 


.  How  many  different  real  solutions  can  you  produce? 


T  7.4.46.  (a)  Show  that,  if  k  E  R,  then  u(t,x)  =  e  ^  1 KX  is  a  complex  solution  to  the 


heat  equation 


du  dzu 


—  k2  £+  i k x 

(b)  Use  complex  conjugation  to  write  down  another  complex 


dt  dx 2 

solution,  (c)  Find  two  independent  real  solutions  to  the  heat  equation,  (d)  Can  k  be 
complex?  If  so,  what  real  solutions  are  produced?  (e)  Which  of  your  solutions  decay  to 
zero  as  t  — >  oo?  (f )  Can  you  solve  the  exercise  assuming  k  G  C  \  R  is  not  real? 


7.4.47.  Show  that  the  free  space  Schrodinger  equation  i 


du  dzu  . 


is  not  a  real  linear  system 


(  1\ 

(  i  \ 

Pi 

(  q 

Pi 

f  i  \ 

( i  \ 

(  1\ 

(a) 

-1 

;  (b) 

—  i 

;  (c) 

0  > 

1  ;  (d) 

0  ’ 

l 

;  (e) 

1 

5 

0 

V  2  J 

Vi  J 

[3/ 

i-i  / 

W 

V  0/ 

K-d 

dt  dx 2 

by  constructing  a  complex  quadratic  polynomial  solution  and  verifying  that  its  real  and 
imaginary  parts  are  not  solutions. 

o 

7.4.48.  Which  of  the  following  sets  of  vectors  span  conjugated  subspaces  of  C  ? 

/  0\ 

1  . 

w 

0  7.4.49.  Prove  that  the  real  and  imaginary  parts  of  a  general  element  of  a  conjugated  vector 
space,  as  defined  by  (7.76),  are  both  real  elements. 

0  7.4.50.  Prove  that  a  subspace  V  C  Cn  is  conjugated  if  and  only  if  it  admits  a  basis  all  of  whose 
elements  are  real. 

0  7.4.51.  Prove  that  if  L[ u]  =  f  is  a  real  inhomogeneous  linear  system  with  real  right-hand  side 
f ,  and  u  =  v  +  i  w  is  a  complex  solution,  then  its  real  part  v  is  a  solution  to  the  system, 
L[v]  =  f,  while  its  imaginary  part  w  solves  the  homogeneous  system  L[w]  =  0. 
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0  7.4.52.  Prove  that  a  linear  function  L:  Cn  — >  Cm  is  real  if  and  only  if  L[ u]  =  A  u,  where  A  is  a 
real  m  x  n  matrix. 

0  7.4.53.  Let  u  =  x  +  iy  be  a  complex  solution  to  a  real  linear  system.  Under  what  conditions 
are  its  real  and  imaginary  parts  x,  y  linearly  independent  real  solutions? 


7.5  Adjoints,  Positive  Definite  Operators, 

and  Minimization  Principles 

Sections  2.5  and  4.4  revealed  the  importance  of  the  adjoint  system  AT y  =  f  in  the  analysis 
of  systems  of  linear  algebraic  equations  Ax  =  b.  Two  of  the  four  fundamental  matrix 
subspaces  are  based  on  the  transposed  matrix.  While  the  mn  x  n  matrix  A  defines  a  linear 
function  from  Mn  to  Mm,  its  transpose,  AT,  has  size  nxm  and  hence  characterizes  a  linear 
function  in  the  reverse  direction,  from  to  Mn. 

As  with  most  basic  concepts  for  linear  algebraic  systems,  the  adjoint  system  and  trans¬ 
pose  operation  on  the  coefficient  matrix  are  the  prototypes  of  a  more  general  construction 
that  is  valid  for  general  linear  functions.  However,  it  is  not  immediately  obvious  how  to 
“transpose”  a  more  general  linear  operator  L[u\:  e.g.,  a  differential  operator  acting  on 
function  space.  In  this  section,  we  shall  introduce  the  abstract  concept  of  the  adjoint  of  a 
linear  function  that  generalizes  the  transpose  operation  on  matrices.  This  will  be  followed 
by  a  general  characterization  of  positive  definite  linear  operators  and  the  characterization 
of  the  solutions  to  the  associated  linear  systems  via  minimization  principles.  Unfortunately, 
we  will  not  have  sufficient  analytical  tools  to  develop  most  of  the  interesting  examples,  and 
instead  refer  the  interested  reader  to  [61,79], 

The  adjoint  (and  transpose)  rely  on  an  inner  product  structure  on  both  the  domain  and 
codomain  spaces.  For  simplicity,  we  restrict  our  attention  to  real  inner  product  spaces, 
leaving  the  complex  version  to  the  interested  reader.  Thus,  we  begin  with  a  linear  function 
L:U  V  that  maps  an  inner  product  space  U  to  a  second  inner  product  space  V.  We 
distinguish  the  inner  products  on  U  and  V  (which  may  be  different  even  when  U  and  V 
are  the  same  vector  space)  by  using  a  single  angle  bracket 


u ,  u 


to  denote  the  inner  product  between  u,  u  £  U, 


and  a  double  angle  bracket 

(( v  ,  v  ))  to  denote  the  inner  product  between  v,  v  E  V. 

Once  inner  products  on  both  the  domain  and  codomain  are  prescribed,  the  abstract  defi¬ 
nition  of  the  adjoint  of  a  linear  function  can  be  formulated. 

Definition  7.53.  Let  U,  V  be  inner  product  spaces,  and  let  L:  U  -A  V  be  a  linear  function. 
The  adjoint  of  L  is  the  function’*'  L*:  U  -A  U  that  satisfies 


((L[u]  ,v))  =  (u ,L 


*r 


for  all 


u  E  U,  v  E  V. 


(7.80) 


The  notation  L*  unfortunately  coincides  with  that  of  the  dual  linear  function  introduced  in 
Exercise  7.2.30.  These  clashing  notations  are  both  well  established  in  the  literature,  although 
occasionally  a  prime,  as  in  U/,L/,  is  used  for  dual  spaces,  maps,  etc.  However,  it  is  possible 
to  reconcile  the  two  notations  in  a  natural  manner;  see  Exercise  7.5.10.  In  this  book,  the  dual 
notation  appears  only  in  these  few  exercises. 
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Note  that  the  adjoint  function  goes  in  the  opposite  direction  to  L,  just  like  the  transposed 
matrix.  Also,  the  left-hand  side  of  equation  (7.80)  indicates  the  inner  product  on  V,  while 
the  right-hand  side  is  the  inner  product  on  V  —  which  is  where  the  respective  vectors 
live.  In  infinite-dimensional  situations,  the  adjoint  may  not  exist.  But  if  it  does,  then  it  is 
uniquely  determined  by  (7.80);  see  Exercise  7.5.7. 

Remark.  Technically,  (7.80)  serves  to  define  the  “formal  adjoint”  of  the  linear  operator  L. 
For  the  infinite-dimensional  function  spaces  arising  in  analysis,  a  true  adjoint  must  satisfy 
certain  additional  analytical  requirements,  [50,  67].  However,  for  pedagogical  reasons,  it  is 
better  to  suppress  such  advanced  analytical  complications  in  this  introductory  treatment. 


Lemma  7.54.  The  adjoint  of  a  linear  function  is  a  linear  function. 


Proof :  Given  u  E  U:  v,w  E  V,  and  scalars  c,d  G  R,  using  the  defining  property  of  the 
adjoint  and  the  bilinearity  of  the  two  inner  products  produces 


u  ,  L*[cv  +  d w] )  =  ((  L[ u] ,  cw  +  d w ))  =  c  (( L[ u] ,  v  ))  +  d  (( L[u] ,  w 


=  c  (  u ,  L 

Since  this  holds  for  all  u  E  I/,  we  must  have 

*  r 


)  +  d  (  u ,  L*[w] )  =  ( u ,  cL*[v]  +  dL*[ w] ). 


L*[cv  +  dw]  =  cL*r 


+  dL 


w 


thereby  proving  linearity. 


Q.E.D, 


Example  7.55.  Let  us  first  show  how  the  defining  equation  (7.80)  for  the  adjoint  leads 

directly  to  the  transpose  of  a  matrix.  Let  L:Mn  -4-  be  the  linear  function  L[w]  =  Aw 
defined  by  multiplication  by  the  m  x  n  matrix  A.  Then  L*:  Mm  -4-  Mn  is  linear,  and  so  is 
represented  by  matrix  multiplication,  L*[v]  =  A*v,  by  an  n  x  m  matrix  A*.  We  impose 
the  ordinary  Euclidean  dot  products 


U  ,  U  )  =1111  =  uT  u, 


u,  u  E  Mn, 


(( V,  v))  =  v  •  v  =  vT  v,  v,  v  E  Mm, 


(7.81) 


as  our  inner  products  on  both  Mn  and  Mm.  Evaluation  of  both  sides  of  the  adjoint  identity 
(7.80)  yields 

(( L[u] ,  v ))  =  (( Au ,  v ))  =  (Au)tv  =  utAt  v, 

( u ,  L*[v] )  =  ( u ,  A*  v )  =  utA*  v. 

Since  these  expressions  must  agree  for  all  u,v,  the  matrix  A*  representing  L*  is  equal  to 
the  transposed  matrix  AT,  as  justified  in  Exercise  1.6.13.  We  conclude  that  the  adjoint  of 
a  matrix  with  respect  to  the  Euclidean  dot  product  is  its  transpose :  A*  =  AT. 


Remark.  See  Exercise  7.2.30  for  another  interpretation  of  the  transpose  in  terms  of  dual 
vector  spaces.  Again,  keep  in  mind  that  the  *  notation  has  a  different  meaning  there. 

Example  7.56.  Let  us  now  adopt  different,  weighted  inner  products  on  the  domain  and 
codomain  for  the  linear  map  L\  Mn  -4-  given  by  L[u]  =  Aw.  Suppose  that 

(i)  the  inner  product  on  the  domain  space  Mn  is  given  by  ( u ,  u)  =  uTMu,  while 

(ii)  the  inner  product  on  the  codomain  is  given  by  (( v ,  v))  =  wTCw. 

Here  M  and  C  are  positive  definite  matrices  of  respective  sizes  n  x  n  and  m  x  m.  Then, 
in  place  of  (7.81),  we  have 

«  Au,v)>  =  (AufCv  =  utAtCv, 


(u,  A*  v)  =  utMA*  v. 
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Equating  these  expressions,  we  deduce  that  ATC  —  MA *.  Therefore,  the  weighted  adjoint 
of  the  matrix  A  is  given  by  the  more  complicated  formula 

A*  =  M~1AtC.  (7.82) 

In  mechanical  applications,  M  plays  the  role  of  the  mass  matrix,  and  explicitly  appears 
in  the  dynamical  systems  to  be  studied  in  Chapter  10.  In  particular,  suppose  A  is  square, 
defining  a  linear  transformation  L:Mn  — »>  Mn.  If  we  adopt  the  same  inner  product 
(v,v)  =  vTCv  on  both  the  domain  and  codomain  spaces  Mn,  then  its  adjoint  matrix 
A*  =  C~x  ATC  is  similar  to  its  transpose. 


Everything  that  we  learned  about  transposes  can  be  reinterpreted  in  the  more  general 
language  of  adjoints.  First,  applying  the  adjoint  operation  twice  returns  you  to  where  you 
began;  this  is  an  immediate  consequence  of  the  defining  equation  (7.80). 

Proposition  7.57.  The  adjoint  of  the  adjoint  of  L  is  just  L  —  (L*)*. 

The  next  result  generalizes  the  fact,  (1.55),  that  the  transpose  of  the  product  of  two 
matrices  is  the  product  of  the  transposes,  in  the  reverse  order. 

Proposition  7.58.  Let  U,V,W  be  inner  product  spaces.  If  L:U  V  and  M:  V  -A  W 
have  respective  adjoints  L*:  V  -A  U  and  M*:  W  V,  then  the  composite  linear  function 
Mol:  U  W  has  adjoint  (Mof)*  =  L*  o M*,  which  maps  11  to  U. 


Proof :  Let  (  u  ,  u ) ,  (( v  ,  v )) ,  ((( w  ,  w ))) ,  denote,  respectively,  the  inner  products  on  1/,  V,  W. 
For  u  E  C7,  w  E  W,  we  compute  using  the  definition  (7.80)  repeatedly: 


( u  ,  (M  °  L)*  [w 


((( M  o  L  [  u  ] ,  w 
((  L[u] ,  M*[ w 


(((M[L[u]],w))) 

<u,L*[M*[w]]> 


Since  this  holds  for  all  u  and  w,  the  identification  follows. 


u  ,  L*  o  M*[ w] 


*  r 


Q.E.D. 


In  this  chapter,  we  have  been  able  to  actually  compute  adjoints  in  just  the  finite- 
dimensional  situation,  when  the  linear  functions  are  given  by  matrix  multiplication.  For 
the  more  challenging  case  of  adjoints  of  linear  operators  on  function  spaces,  e.g.,  differential 
operators  appearing  in  boundary  value  problems,  the  reader  should  consult  [61 


Exercises 


o 

7.5.1.  Choose  one  from  the  following  list  of  inner  products  on  R  .  Then  find  the  adjoint  of 
1  2 


A  = 


1  3 


when  your  inner  product  is  used  on  both  its  domain  and  codomain,  (a)  the 


Euclidean  dot  product;  (b)  the  weighted  inner  product  (v,w)  =  2 v1w1  +3f2re2’  (c) 

2  -1 
1  4 


inner  product  ( v  ,  w  )  =  wT K  w  defined  by  the  positive  definite  matrix  K  =  ^ 


7.5.2.  From  the  list  in  Exercise  7.5.1,  choose  different  inner  products  on  the  domain  and 
codomain,  and  then  determine  the  adjoint  of  the  matrix  A. 

o 

7.5.3.  Choose  one  from  the  following  list  of  inner  products  on  R  for  both  the  domain  and 

(l  10 


codomain,  and  find  the  adjoint  of  A  = 


—  1  0  1  I :  (a)  the  Euclidean  dot  product; 

V  0  -1  2 
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(b)  the  weighted  inner  product  (v,w)  =  v1w1  +  2v2w2  +  (c)  the  inner  product 

/2  1  0 

rp 

( v  ,  w )  =  v  it  w  defined  by  the  positive  definite  matrix  AT  =  12  1 

\0  1  2 

7.5.4.  From  the  list  in  Exercise  7.5.3,  choose  different  inner  products  on  the  domain  and 
codomain,  and  then  compute  the  adjoint  of  the  matrix  A. 

o 

7.5.5.  Choose  an  inner  product  on  R  from  the  list  in  Exercise  7.5.1,  and  an  inner  product  on 

/  1  3  \ 

o 

M  from  the  list  in  Exercise  7.5.3,  and  then  compute  the  adjoint  of  A  =  0  2. 

V-i  1/ 

7.5.6.  Let  V ^  be  the  space  of  quadratic  polynomials  equipped  with  the  inner  product 
(p ,  q)  =  p(x)  q(x)  dx.  Find  the  adjoint  of  the  derivative  operator  D[p]  =  p  on  V 


0  7.5.7.  Prove  that,  if  it  exists,  the  adjoint  of  a  linear  function  is  uniquely  determined  by  (7.80). 

0  7.5.8.  Prove  that  (a)  (L  +  M)*  =  L*  +  M*,  (b)  (cL)*  =  cL*  for  cGM, 

(c)  (L*)*=L,  (d)  (L-1)*  =  (L*)-1. 


0  7.5.9.  Let  L:U  — V  be  a  linear  function  between  inner  product  spaces.  Prove  that  u  £ 
solves  the  inhomogeneous  linear  system  L[ u]  =  f  if  and  only  if 


n 


<u,L*[v]  )  =  (f ,v> 


for  all 


v  £  V. 


(7.83) 


Explain  why  Exercise  3.1.11  is  a  special  case  of  this  result.  Remark.  Equation  (7.83) 
is  known  as  the  weak  formulation  of  the  linear  system.  It  plays  an  essential  role  in  the 
analysis  of  differential  equations  and  their  numerical  approximations,  [61]. 


0  7.5.10.  Suppose  V,W  are  finite-dimensional  inner  product  spaces  with  dual  space  F*,IF*.  Let 
L:  V  — >  W  be  a  linear  function,  and  let  L*:  IF*  — >  F*  denote  the  dual  linear  function,  as  in 

Exercise  7.2.30  (without  the  tilde),  while  L*:  W  — >  V  denotes  its  adjoint.  (As  noted  above, 
the  same  notation  denotes  two  mathematically  different  objects.)  Prove  that  if  we  identify 
F*  ~  V  and  IF*  ~  IF  using  the  linear  isomorphism  in  Exercise  7.1.62,  then  the  dual  and 

adjoint  functions  are  identified  L*  —  L*,  thus  reconciling  the  unfortunate  clash  in  notation. 
In  particular,  this  includes  the  two  possible  interpretations  of  the  transpose  of  a  matrix. 


Self-Adjoint  and  Positive  Definite  Linear  Functions 


Throughout  this  section  U  will  be  an  inner  product  space.  We  will  show  how  to  generalize 
the  notions  of  symmetric  and  positive  definite  matrices  to  linear  operators  on  U  in  a  natural 
fashion.  First,  we  define  the  analogue  of  a  symmetric  matrix. 


Definition  7.59.  A  linear  function  J:U  — >>  U  is  called  self-adjoint  if  J*  =  J.  A  self- 
adjoint  linear  function  is  positive  definite ,  written  J  >  0,  if 


(  u  ,  J[u] )  >  0  for  all 


O^ueU. 


(7.84) 


In  particular,  if  J  >  0  then  ker  J  —  {0}.  (Why?)  Thus,  a  positive  definite  linear  system 
J[u]  =  f  with  f  £  img  J  must  have  a  unique  solution.  The  next  result  generalizes  our 
basic  observation  that  the  Gram  matrices  ATA  and  ATCA ,  cf.  (3.62,64),  are  symmetric 
and  positive  (semi-) definite. 
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Theorem  7.60.  Let  L :  U  — »>  V  be  a  linear  map  between  inner  product  spaces  with  adjoint 
L*:V  -A  U.  Then  the  composite  map  J  =  L*  o  L:U  U  is  self-adjoint.  Moreover,  J  is 
positive  definite  if  and  only  if  kerL  =  {0}. 


Proof :  First,  by  Propositions  7.58  and  7.57, 


J*  =  (L*  °L)*  =  L*  o  (L*)*  =  L*°L  =  J, 


proving  self-adjointness.  Furthermore,  for  u  E  [/,  the  inner  product 

u ,  J[u] )  =  ( u ,  L*[L[u]  ] )  =  ( L[u] ,  L[u] )  =  ||  L[u]  ||2  >  0 


is  strictly  positive,  provided  that  L[u]  7^  0.  Thus,  if  kerL  =  {0},  then  the  positivity 
condition  (7.84)  holds,  and  conversely.  Q.E.D. 


Let  us  specialize  to  the  case  of  a  linear  function  L :  Mn  that  is  represented  by 

the  m  x  n  matrix  A,  so  that  L[u]  =  A u.  When  the  Euclidean  dot  product  is  used  on 
the  two  spaces,  the  adjoint  L*  is  represented  by  the  transpose  AT ,  and  hence  the  map 
J  =  L*  oL  has  matrix  representation  J[ u]  =  K u,  where  K  =  ATA.  Therefore,  in  this  case 
Theorem  7.60  reduces  to  our  earlier  Proposition  3.36,  governing  the  positive  definiteness 
of  the  Gram  matrix  product  ATA.  If  we  change  the  inner  product  on  the  codomain  to 
(( w  ,  w ))  =  w TC  w  for  some  C  >  0,  then  L*  is  represented  by  ATG,  and  hence  J  —  L*  °L 

has  matrix  form  K  =  AT C  A,  which  is  the  general  symmetric,  positive  definite  Gram 
matrix  constructed  in  (3.64)  that  underlay  our  development  of  the  equations  of  equilibrium 
in  Chapter  6. 

Finally,  if  we  further  replace  the  dot  product  on  the  domain  space  Mn  by  the  alternative 
inner  product  ( v  ,  v )  =  vTM  v  for  M  >  0,  then,  according  to  formula  (7.82),  the  adjoint 
of  L  has  matrix  form 


A*  =  M  1AtC ,  and  therefore  K  =  A*  A  =  M  1ATC  A  (7.85) 

is  a  self-adjoint,  positive  (semi-)dehnite  matrix  with  respect  to  the  weighted  inner  product 
on  Mn  prescribed  by  the  positive  definite  matrix  M .  In  this  case,  the  positive  definite, 
self-adjoint  operator  J  is  no  longer  represented  by  a  symmetric  matrix.  So,  we  did  not 
quite  tell  the  truth  when  we  said  we  would  allow  only  symmetric  matrices  to  be  positive 
definite  —  we  really  meant  only  self-adjoint  matrices. 

General  self-adjoint  matrices  will  be  important  in  our  discussion  of  the  vibrations  of 
mass-spring  chains  that  have  unequal  masses.  Extensions  of  these  constructions  to  dif¬ 
ferential  operators  underlies  the  analysis  of  the  boundary  value  problems  of  continuum 
mechanics,  to  be  studied  in  [61]. 


Exercises 

9 

7.5.11.  Show  that  the  following  linear  transformations  of  R  are  self-adjoint  with  respect  to  the 
Euclidean  dot  product:  (a)  rotation  through  the  angle  0  =  7r;  (b)  reflection  about  the  line 

y  =  x.  (c)  The  scaling  map  S[x]  =  3x;  (d)  orthogonal  projection  onto  the  line  y  =  x. 

0  7.5.12.  Let  M  be  a  positive  definite  matrix.  Show  that  A:  Rn  — >  Rn  is  self-adjoint  with  respect 

m 

to  the  inner  product  ( v  ,  w )  =  v  M  w  if  and  only  if  M  A  is  a  symmetric  matrix. 
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7.5.13.  Prove  that  A  =  ^  ^  is  self-adjoint  with  respect  to  the  weighted  inner  product 

(v,w)  =  2v1w1  +  3v2w2.  Hint:  Use  the  criterion  in  Exercise  7.5.12. 

11  Q 

7.5.14.  Consider  the  weighted  inner  product  (v,  w)  =  v  w1  +  ^  v2  w2  +  ^  ie3  on  R  . 

(a)  What  are  the  conditions  on  the  entries  of  a  3  x  3  matrix  A  in  order  that  it  be  self- 
adjoint?  Hint :  Use  the  criterion  in  Exercise  7.5.12.  (b)  Write  down  an  example  of  a 
non-diagonal  self-adjoint  matrix. 


/ 


7.5.15.  Answer  Exercise  7.5.14  for  the  inner  product  based  on 


2-1  0 

1  2  -1 
0-1  2 

7.5.16.  True  or  false:  The  identity  transformation  is  self-adjoint  for  an  arbitrary  inner  product 
on  the  underlying  vector  space. 


7.5.17. 

7.5.18. 
(0 

0  7.5.19. 

(a) 

( b ) 


True  or  false:  A  diagonal  matrix  is  self-adjoint  for  an  arbitrary  inner  product  on  Rn. 

Suppose  L:U  — )►  U  has  an  adjoint  L*:  U  —*U.  (a)  Show  that  L  +  L*  is  self-adjoint. 
Show  that  L°L*  is  self-adjoint. 

Suppose  J,  M:  U  — >  U  are  self-adjoint  linear  functions  on  an  inner  product  space  U. 
Prove  that  (  J[u]  ,  u )  =  ( M[u]  ,  u )  for  all  u  £  V  if  and  only  if  J  =  M . 

Explain  why  this  result  is  false  if  the  self-adjointness  hypothesis  is  dropped. 


7.5.20.  Prove  that  if  L  :  U  — >  U  is  an  invertible  linear  transformation  on  an  inner  product  space 
U,  then  the  following  three  statements  are  equivalent:  (a)  (L[u]  ,  L[v]  )  =  (u,v)  for  all 


u,  v  £  U.  (b)  ||  L[ u]  ||  =  ||  u  ||  for  all  u  £  U.  (c)  L*  =  L 


Hint:  Use  Exercise  7.5.19. 


7.5.21.  (a)  Prove  that  the  operation  Ma[u(x)]  =  a(x)u(x)  of  multiplication  by  a  continuous 

function  a(x)  defines  a  self-adjoint  linear  operator  on  the  function  space  C °[a,6]  with 
respect  to  the  L  inner  product,  (b)  Is  Ma  also  self-adjoint  with  respect  to  the  weighted 

fb 

inner  product  ((/,#))  =  /  f(x)  g(pc)  w(pc)  dxl 

J  a 

C  7.5.22.  A  linear  function  S:U  — >  U  is  called  skew-adjoint  if  A*  =  —S.  (a)  Prove  that  a 

skew-symmetric  matrix  is  skew-adjoint  with  respect  to  the  standard  dot  product  on  Rn. 

(b)  Under  what  conditions  is  S[x]  =  Ax  skew-adjoint  with  respect  to  the  inner  product 
(x,y)  =  xTM y  on  Rn?  (c)  Let  L:U  — ^  U  have  an  adjoint  L*.  Prove  that  L  —  L*  is 

skew-adjoint,  (d)  Explain  why  every  linear  operator  L:U  U  that  has  an  adjoint  L*  can 
be  written  as  the  sum  of  a  self-adjoint  and  a  skew-adjoint  operator. 


0  7.5.23.  (a)  Let  L1:U  — >  and  L2:U  — >  V2  be  linear  maps  between  inner  product  spaces, 

with  U1?  V2  not  necessarily  the  same.  Let  J1  =  L1  o  L1?  J2  =  L2  °  L2.  Show  that  the  sum 
J  =  J1  +  J2  can  be  written  as  a  self-adjoint  combination  J  =  L*  °  L  for  some  linear  operator 
L.  Hint:  See  Exercise  3.4.35  for  the  matrix  case. 


Minimization 

In  Chapter  5,  we  learned  how  the  solution  to  a  linear  algebraic  system  K u  =  f  with  positive 
definite  coefficient  matrix  K  can  be  characterized  as  the  unique  minimizer  for  the  quad¬ 
ratic  function  p(u)  =  \  uTiEu  —  uTf.  There  is  an  analogous  minimization  principle  that 
characterizes  the  solutions  to  linear  systems  defined  by  positive  definite  linear  operators. 
This  general  result  is  of  tremendous  importance  in  analysis  of  boundary  value  problems  for 
differential  equations,  for  both  physical  and  mathematical  reasons,  and  also  inspires  the 
finite  element  numerical  solution  algorithm,  [61]. 

We  restrict  our  attention  to  real  linear  functions  on  real  vector  spaces  in  this  section. 


7.5  Adjoints,  Positive  Definite  Operators,  and  Minimization  Principles 
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Theorem  7.61.  Let  J:U  U  be  a  positive  definite  linear  function  on  a  real  inner  product 
space  U .  If  f  E  img  J,  then  the  quadratic  function 


\  (  u  ,  J[ u 


(u,f 


(7.86) 


has  a  unique  minimizer  u  =  u*,  which  is  the  solution  to  the  linear  system  J[ u 


Proof :  The  proof  mimics  that  of  its  matrix  counterpart  in  Theorem  5.2.  Our  assumption 
that  f  E  img  J  implies  that  there  is  a  u *  E  (7  such  that  J[u*]  =  f.  Thus,  we  can  write 

p( u)  =  ^  (  u  ,  J[u] )  —  (  u  ,  J[u*] )  =  \  (  u  —  u*  ,  J[u  —  u*] )  —  \  (  u*  ,  J[u*] ),  (7.87) 

where  we  used  linearity,  along  with  the  fact  that  J  is  self-adjoint  to  identify  the  terms 
(u,  J[u*])  =  (  u*  ,  J[u]).  Since  J  >  o,  the  first  term  on  the  right-hand  side  of  (7.87)  is 
always  >  0;  moreover,  it  equals  its  minimal  value  0  if  and  only  if  u  =  u*.  On  the  other 
hand,  the  second  term  does  not  depend  upon  u  at  all,  and  hence  is  unaffected  by  variations 
in  u.  Therefore,  to  minimize  p( u),  we  must  make  the  first  term  as  small  as  possible,  which 
is  accomplished  by  setting  u  =  u*.  Q.E.D. 


Remark.  For  linear  functions  given  by  matrix  multiplication,  positive  definiteness  auto¬ 
matically  implies  invertibility,  and  so  the  linear  system  Ku  —  f  has  a  solution  for  every 
right-hand  side.  This  is  not  so  immediate  when  J  is  a  positive  definite  operator  on  an 
infinite-dimensional  function  space.  Therefore,  the  existence  of  a  solution  or  minimizer 
is  a  significant  issue.  And,  in  fact,  many  modern  analytical  existence  results  rely  on  the 
determination  of  suitable  minimization  principles.  On  the  other  hand,  once  existence  is 
assured,  uniqueness  follows  immediately  from  the  positive  definiteness  of  the  operator  J. 


Theorem  7.62.  Suppose  L:U  -A  V  is  a  linear  function  between  inner  product  spaces  with 
kerL  =  {0}  and  adjoint  function  L*:V  — >•  U.  Let  J  =  L*  o L:U  4  [/  be  the  associated 
positive  definite  linear  function.  If  f  E  img  J,  then  the  quadratic  function 


p( u)  =  \  ||  L[u 


u ,  f 


(7.88) 


has  a  unique  minimizer  u*,  which  is  the  solution  to  the  linear  system  J[u*]  =  f. 


Proof :  It  suffices  to  note  that  the  quadratic  term  in  (7.88)  can  be  written  in  the  alternative 
form 

||  L[u]  || 2  =  (( L[u] ,  L[u] ))  =  (  u ,  L*[L[u]  ] )  =  ( u ,  J[u] ). 

Thus,  (7.88)  reduces  to  the  quadratic  function  of  the  form  (7.86)  with  J  =  L*  oL,  and  so 
Theorem  7.62  follows  directly  from  Theorem  7.61.  Q.E.D. 

Warning.  In  (7.88),  the  first  term  ||L[u]||2  is  computed  using  the  norm  based  on  the 
inner  product  on  V,  while  the  second  term  (u,  f )  employs  the  inner  product  on  U. 

Example  7.63.  For  a  general  positive  definite  matrix  (7.85),  the  quadratic  function 
(7.88)  is  computed  with  respect  to  the  alternative  inner  product  (u,u)  =  uTMu,  so 

p(  u)  =  7,  ||Au||2  —  (u,f)  =  7,  (Au)T  C  Au  —  u T  M  f  =  ^  u T  (ATC  A)  u  —  uT(M  f). 

Theorem  7.62  tells  us  that  the  minimizer  of  the  quadratic  function  is  the  solution  to 


ATCAu  =  Mf,  which  we  rewrite  as  Ku  =  M  LA1CAu  =  f. 


-1  aT. 
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This  conclusion  also  follows  from  our  earlier  finite-dimensional  Minimization  Theorem  5.2. 

In  [61,  79],  it  is  shown  that  the  most  important  minimization  principles  that  charac¬ 
terize  solutions  to  the  linear  boundary  value  problems  of  physics  and  engineering  all  arise 
through  this  remarkably  general  mathematical  construction. 


Exercises 


7.5.24.  Find  the  minimum  value  of  p(u)  =  i  ^  g  J  u  —  vX  f  j  J  for  u  G  R2. 


1  T 


7.5.25.  Minimize  the  function  p( u)  =  -  u 

2 


/  2  -1  0  \ 
-1  4  -2 

V  0  -2  3 


u  —  u 


T 


7.5.26.  Minimize  ||  (  2x  —  y,  x  +  y  )T  ||2  —  6x  over  all  x,  y ,  where 

2 

norm  on  R  . 

7.5.27.  Answer  Exercise  7.5.26  for  (a)  the  weighted  norm  ||  (x,y) 

(b)  the  norm  based  on  1  ^  ^ 


(  2\ 

0  for  u  G  R3. 

V-1 ) 

denotes  the  Euclidean 


T 


7.5.28.  Let  L(x,y)  = 


-1 


/  x  —  2  y  \ 


1  r 


(c)  the  norm  based  on 


=  y 2x2  +  3 y2  ; 

3  M 

i  3  r 


and  f  =  (  q  j  •  Minimize  p(x)  =  ^  ||  L[x]  ||2  —  ( x ,  f )  using 


x  +  y 

V-x  +  3 y  J 

O  Q 

(a)  the  Euclidean  inner  products  and  norms  on  both  R  and  R  ;  (b)  the  Euclidean  inner 
product  on  R2  and  the  weighted  norm  ||w||  =  +  2ic|  +  3ic|  on  R3;  (c)  the  inner 


product  given  by 


2  -1 


r)  _  q 

on  R  and  the  Euclidean  norm  on  R  ,  (d)  the  inner  product 


given  by 


2  -1 


2 

on  R  and  the  weighted  norm 


w 


_  .  2 


wf  +  2^2  +  3^3  on 


T 

7.5.29.  Find  the  minimum  distance  between  the  point  ( 1,  0,  0  )  and  the  plane  x  +  y  —  z  =  0 


when  distance  is  measured  in  (a)  the  Euclidean  norm;  (b)  the  weighted  norm 


w 


/  3  — 1 


w- 


\  +  2w>2  +  3u>3  ;  (c)  the  norm  based  on  the  positive  definite  matrix 


V 


-1 

1 


2  -1 
1  3 


0  7.5.30.  How  would  you  modify  the  statement  of  Theorem  7.62  if  kerL  ^  {0}? 


® 

Check  for 

updates 

Chapter  8 

Eigenvalues  and  Singular  Values 

So  far,  our  physical  applications  of  linear  algebra  have  concentrated  on  statics:  unchanging 
equilibrium  configurations  of  mass-spring  chains,  circuits,  and  structures,  all  modeled  by 
linear  systems  of  algebraic  equations.  It  is  now  time  to  set  the  universe  in  motion.  In 
general,  a  (continuous)  dynamical  system  refers  to  the  (differential)  equations  governing 
the  time-varying  motion  of  a  physical  system,  be  it  mechanical,  electrical,  chemical,  fluid, 
thermodynamical,  biological,  financial,  ...  .  Our  immediate  goal  is  to  solve  the  simplest 
class  of  continuous  dynamical  models,  which  are  first  order  autonomous  linear  systems  of 
ordinary  differential  equations. 

We  begin  with  a  very  quick  review  of  the  scalar  case,  whose  solutions  are  simple  ex¬ 
ponential  functions.  This  inspires  us  to  try  to  solve  a  vector-valued  linear  system  by 
substituting  a  similar  exponential  solution  formula.  We  are  immediately  led  to  the  sys¬ 
tem  of  algebraic  equations  that  define  the  eigenvalues  and  eigenvectors  of  the  coefficient 
matrix.  Thus,  before  we  can  make  any  progress  in  our  study  of  differential  equations,  we 
need  to  learn  about  eigenvalues  and  eigenvectors,  and  that  is  the  purpose  of  the  present 
chapter.  Dynamical  systems  are  used  to  motivate  the  subject,  but  serious  applications  will 
be  deferred  until  Chapter  10.  Additional  applications  of  eigenvalues  and  eigenvectors  to 
linear  iterative  systems,  stochastic  processes,  and  numerical  solution  algorithms  for  linear 
algebraic  systems  form  the  focus  of  Chapter  9. 

Each  square  matrix  possesses  a  collection  of  one  or  more  complex  scalars,  called  eigen¬ 
values,  and  associated  vectors,  called  eigenvectors.  From  a  geometrical  viewpoint,  the 
matrix  defines  a  linear  transformation  on  Euclidean  space;  the  eigenvectors  indicate  the 
directions  of  pure  stretch  and  the  eigenvalues  the  extent  of  stretching.  We  will  intro¬ 
duce  the  non-standard  term  “complete”  to  describe  matrices  whose  (complex)  eigenvectors 
form  a  basis  of  the  underlying  vector  space.  A  more  common  name  for  such  matrices  is 
“diagonalizable” ,  because,  when  expressed  in  terms  of  its  eigenvector  basis,  the  matrix 
representing  the  corresponding  linear  transformation  assumes  a  very  simple  diagonal  form, 
facilitating  the  detailed  analysis  of  its  properties.  A  particularly  important  class  consists 
of  the  symmetric  matrices,  whose  eigenvectors  form  an  orthogonal  basis  of  Mn;  in  fact,  this 
is  how  orthogonal  bases  most  naturally  appear.  Most  matrices  are  complete;  incomplete 
matrices  are  trickier  to  deal  with,  and  we  discuss  their  non-diagonal  Schur  decomposition 
and  Jordan  canonical  form  in  Section  8.6. 

A  non-square  matrix  A  does  not  possess  eigenvalues.  In  their  place,  one  studies  the 
eigenvalues  of  the  associated  square  Gram  matrix  K  =  ATA:  whose  square  roots  are  known 
as  the  singular  values  of  the  original  matrix.  The  corresponding  singular  value  decompo¬ 
sition  (SVD)  supplies  the  final  details  for  our  understanding  of  the  remarkable  geometric 
structure  governing  matrix  multiplication.  The  singular  value  decomposition  is  used  to  de¬ 
fine  the  pseudoinverse  of  a  matrix,  which  provides  a  mechanism  for  “inverting”  non-square 
and  singular  matrices,  and  an  alternative  construction  of  least  squares  solutions  to  general 
linear  systems.  Singular  values  underlie  the  powerful  method  of  modern  statistical  data 
analysis  known  as  principal  component  analysis  (PC A),  which  will  be  developed  in  the 
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final  section  of  this  chapter  and  appears  in  an  increasingly  broad  range  of  contemporary 
applications,  including  image  processing,  semantics,  language  and  speech  recognition,  and 
machine  learning. 

Remark.  The  numerical  computation  of  eigenvalues  and  eigenvectors  is  a  challenging 
issue,  and  must  be  deferred  until  Section  9.5.  Unless  you  are  prepared  to  consult  that 
section  in  advance,  solving  the  computer-based  problems  in  this  chapter  will  require  access 
to  computer  software  that  can  accurately  compute  eigenvalues  and  eigenvectors. 


8.1  Linear  Dynamical  Systems 

Our  new  goal  is  to  solve  and  analyze  the  simplest  class  of  dynamical  systems,  namely  those 
modeled  by  first  order  linear  systems  of  ordinary  differential  equations.  We  begin  with 
a  thorough  review  of  the  scalar  case,  including  a  complete  investigation  into  the  stability 
of  their  equilibria  —  in  preparation  for  the  general  situation  to  be  treated  in  depth  in 
Chapter  10.  Readers  who  are  not  interested  in  such  motivational  material  may  skip  ahead 
to  Section  8.2  without  incurring  any  penalty. 


Scalar  Ordinary  Differential  Equations 


Consider  the  elementary  first  order  scalar  ordinary  differential  equation 

du 


dt 


=  au. 


(8.1) 


Here  a  E  R  is  a  real  constant,  while  the  unknown  u(t)  is  a  scalar  function.  As  you  no  doubt 
already  learned,  e.g.,  in  [7,  22,  78],  the  general  solution  to  (8.1)  is  an  exponential  function 


u(t)  =  ce 


a  t 


(8.2) 


The  integration  constant  c  is  uniquely  determined  by  a  single  initial  condition 


u(t0)  =  b 

imposed  at  an  initial  time  t0.  Substituting  t  —  tQ  into  the  solution  formula  (8.2), 

u(t0)  =  ceat°  =  6,  and  so  c  =  be~at° . 


We  conclude  that 

u(t)  =  bea(yt~to">  (8.4) 

is  the  unique  solution  to  the  scalar  initial  value  problem  (8.1,  3). 


Example  8.1.  The  radioactive  decay  of  an  isotope,  say  uranium-238,  is  governed  by  the 
differential  equation 


du 

dt 


=  —  7  u. 


Here  u(t)  denotes  the  amount  of  the  isotope  remaining  at  time  t;  the  coefficient  7  >  0 
governs  the  decay  rate.  The  solution  is  an  exponentially  decaying  function  u(t)  =  ce~7t, 
where  c  —  u{ 0)  is  the  amount  of  radioactive  material  at  the  initial  time  t0  =  0. 

The  isotope’s  half-life  t*  is  the  time  it  takes  for  half  of  a  sample  to  decay,  that  is,  when 
u(t*)  =  ^u(0).  To  determine  £*,  we  solve  the  algebraic  equation 


e 


-7 


so  that 


log  2 

7 
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Figure  8.1.  Solutions  to  u  —  au. 


Thus,  after  an  elapsed  time  of  £*,  half  of  the  original  material  has  decayed.  After  a  further 
£*,  half  of  the  remaining  half  has  decayed,  leaving  a  quarter  of  the  original  radioactive 
material.  And  so  on,  so  that  at  each  integer  multiple  nt*,  n  E  N,  of  the  half-life,  the 
remaining  amount  of  the  isotope  is  u(nt fi)  =  2~n  u( 0). 

Returning  to  the  general  situation,  let  us  make  some  elementary,  but  pertinent,  obser¬ 
vations  about  this  simplest  linear  dynamical  system.  First  of  all,  since  the  equation  is 
homogeneous,  the  zero  function  u(t)  =  0  —  corresponding  to  c  —  0  in  the  solution  for¬ 
mula  (8.2)  —  is  a  constant  solution,  known  as  an  equilibrium  solution ,  or  fixed  point ,  since 
it  does  not  depend  on  t.  If  the  coefficient  a  >  0  is  positive,  then  the  solutions  (8.2)  are 
exponentially  growing  (in  absolute  value)  as  t  Too.  This  implies  that  the  zero  equilib¬ 
rium  solution  is  unstable.  The  initial  condition  u(t0)  =  0  produces  the  zero  solution,  but  if 
we  make  a  tiny  error  (either  physical,  numerical,  or  mathematical)  in  the  initial  data,  say 
u(t0)  =  £,  then  the  solution  u(t)  =  £  ea(t_t°)  will  eventually  be  far  away  from  equilibrium. 
More  generally,  any  two  solutions  with  very  close,  but  not  equal,  initial  data  will  eventu¬ 
ally  become  arbitrarily  far  apart:  |  ufit)  —  u2(t )  |  —¥  oo  as  t  oo.  One  consequence  is  an 
inherent  difficulty  in  accurately  computing  the  long-time  behavior  of  the  solution,  since 
small  numerical  errors  may  eventually  have  very  large  effects. 

On  the  other  hand,  if  a  <  0,  the  solutions  are  exponentially  decaying  in  time.  In  this 
case,  the  zero  equilibrium  solution  is  stable ,  since  a  small  change  in  the  initial  data  will 
have  a  negligible  effect  on  the  solution.  In  fact,  the  zero  solution  is  globally  asymptotically 
stable.  The  phrase  “asymptotically  stable”  indicates  that  solutions  that  start  out  near  the 
equilibrium  point  approach  it  in  the  large  time  limit;  more  specifically,  if  u(t0)  =  £  is  small, 
then  u(t)  —¥  0  as  t  -D  oo.  “Globally”  implies  that  all  solutions,  no  matter  how  large  the 
initial  data,  eventually  approach  equilibrium.  In  fact,  for  a  linear  system,  the  stability  of 
an  equilibrium  solution  is  inevitably  a  global  phenomenon. 

The  borderline  case  is  a  =  0.  Then  all  the  solutions  to  (8.1)  are  constant.  In  this  case, 
the  zero  solution  is  stable  —  indeed,  globally  stable  —  but  not  asymptotically  stable.  The 
solution  to  the  initial  value  problem  u(t0)  —  £  is  u(t)  =  e.  Therefore,  a  solution  that  starts 
out  near  equilibrium  will  remain  nearby,  but  will  not  asymptotically  approach  it.  The 
three  qualitatively  different  possibilities  are  illustrated  in  Figure  8.1.  A  formal  definition 
of  the  various  notions  of  stability  for  dynamical  systems  can  be  found  in  Definition  10.14. 
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Exercises 


8.1.1.  Solve  the  following  initial  value  problems: 


du 


du 


du 


(a)  —  =  5 u,  u( 0)  =  —3,  (b)  —  =  2 u,  u ( 1 )  =  3,  (c)  —  =  —3 u,  u(— 1)  =  1. 


dt 


dt 


dt 


8.1.2.  Suppose  a  radioactive  material  has  a  half-life  of  100  years.  What  is  the  decay  rate  7? 
Starting  with  an  initial  sample  of  100  grams,  how  much  will  be  left  after  10  years? 
after  100  years?  after  1,000  years? 

8.1.3.  Carbon- 14  has  a  half-life  of  5730  years.  Human  skeletal  fragments  discovered  in  a  cave 
are  analyzed  and  found  to  have  only  6.24%  of  the  carbon- 14  that  living  tissue  would  have. 
How  old  are  the  remains? 

8.1.4.  Prove  that  if  t*  is  the  half-life  of  a  radioactive  material,  then  u(nt *)  =  2~n  u(0). 
Explain  the  meaning  of  this  equation  in  your  own  words. 

8.1.5.  A  bacteria  colony  grows  according  to  the  equation  du/dt  =  1.3  u.  How  long  until  the 
colony  doubles?  quadruples?  If  the  initial  population  is  2,  how  long  until  the  population 
reaches  2  million? 

8.1.6.  Deer  in  northern  Minnesota  reproduce  according  to  the  linear  differential  equation 
du 

——  =  .27 u  where  t  is  measured  in  years.  If  the  initial  population  is  u(0)  =  5,000  and 

C L  L 

the  environment  can  sustain  at  most  1,000,000  deer,  how  long  until  the  deer  run  out  of 
resources? 

du 

0  8.1.7.  Consider  the  inhomogeneous  differential  equation  —  =  au  +  6,  where  a,  6  are  constants. 

dt 

(a)  Show  that  u*  =  —6/a  is  a  constant  equilibrium  solution,  (b)  Solve  the  differential 
equation.  Hint :  Look  at  the  differential  equation  satisfied  by  v  =  u  —  u*. 

(c)  Discuss  the  stability  of  the  equilibrium  solution  u*. 

8.1.8.  Use  the  method  of  Exercise  8.1.7  to  solve  the  following  initial  value  problems: 

,  x  du  _  .  .  ...  du  _  _  /  -t  \  ^  /  x  du  0  _  /  r\\ 

(a)  —  =  2u  —  1 ,  7/(0)  =  1,  (b)  — -  =  5^  +  15,  u(l)  =  —  3,  (c)  —  =  —  3u  +  6,  u( 2)  =  —  1. 


dt 


dt 


dt 


8.1.9.  The  radioactive  waste  from  a  nuclear  reactor  has  a  half-life  of  1000  years.  Waste  is 
continually  produced  at  the  rate  of  5  tons  per  year  and  stored  in  a  dump  site. 

(a)  Set  up  an  inhomogeneous  differential  equation,  of  the  form  in  Exercise  8.1.7,  to  model 
the  amount  of  radioactive  waste,  (b)  Determine  whether  the  amount  of  radioactive 
material  at  the  dump  increases  indefinitely,  decreases  to  zero,  or  eventually  stabilizes  at 
some  fixed  amount,  (c)  Starting  with  a  brand  new  site,  how  long  will  it  be  until  the  dump 
contains  100  tons  of  radioactive  material? 

7?  8.1.10.  Suppose  that  hunters  are  allowed  to  shoot  a  fixed  number  of  the  northern  Minnesota 
deer  in  Exercise  8.1.6  each  year,  (a)  Explain  why  the  population  model  takes  the  form 
du 

—  =  .27 u  —  6,  where  6  is  the  number  killed  yearly.  (Ignore  the  seasonal  aspects  of  hunting.) 

(b)  If  6  =  1,000,  how  long  until  the  deer  run  out  of  resources?  Hint :  See  Exercise  8.1.7. 

(c)  What  is  the  maximal  rate  at  which  deer  can  be  hunted  without  causing  their 
extinction? 

du  2  1 

8.1.11.  (a)  Write  down  the  exact  solution  to  the  initial  value  problem  —  =  —  u,  'a(O)  =  —  . 

dt  7  3 

(b)  Suppose  you  make  the  approximation  u(0)  =  .3333.  At  what  point  does  your  solution 

differ  from  the  true  solution  by  1  unit?  by  1000  units?  (c)  Answer  the  same  question  if  you 

du 

also  approximate  the  coefficient  in  the  differential  equation  by  —  =  .2857  u. 

dt 
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0  8.1.12.  Let  a  be  complex.  Prove  that  u(t)  =  ceat  is  the  (complex)  solution  to  our  scalar 
ordinary  differential  equation  (8.1).  Describe  the  asymptotic  behavior  of  the  solution  as 
t  — >  oo,  and  the  stability  properties  of  the  zero  equilibrium  solution. 


du 

0  8.1.13.  (a)  Prove  that  if  u^(t)  and  u2(t)  are  any  two  distinct  solutions  to  —  =  au  with  a  >  0, 

then  |  u1(t)  —  u2(t)  |  — >  oo  as  t  — >  oo.  (b)  If  a  =  .02  and  tt1(0)  =  .1,  u2(0)  =  .05,  how  long 
do  you  have  to  wait  until  |  u±(t)  —  u2(t)  \  >  1,  000? 


First  Order  Dynamical  Systems 


The  simplest  class  of  dynamical  system  is  a  coupled  system  of  n  first  order  ordinary  differ¬ 
ential  equations 


du1 

dt 


i , . . . ,  un^ , 


The  unknowns  are  the  n  scalar  functions  u1(t)^ . . . ,  un(t)  depending  on  the  scalar  variable 
t  G  M,  which  we  usually  view  as  time,  whence  the  term  “dynamical”.  We  will  often  write 
the  system  in  the  equivalent  vector  form 


du 

dt 


=  f(r  u), 


in  which  u(t)  =  ( u1(t ), . . . ,  un(t))T  is  the  vector- valued  solution,  which  serves  to  parameter- 
ize  a  curve  in  Mn,  and  f  (£,  u)  is  a  vector- valued  function  with  components  fi{t:u1:...:  un ) 
for  i  =  1, . . . ,  n.  A  dynamical  system  is  called  autonomous  if  the  time  variable  t  does  not 
appear  explicitly  on  the  right-hand  side,  and  so  has  the  form 


ft  -  f<u>-  <8-8) 

Dynamical  systems  of  ordinary  differential  equations  appear  in  an  astonishing  variety  of 
applications,  including  physics,  astronomy,  chemistry,  biology,  weather  and  climate,  eco¬ 
nomics  and  finance,  and  have  been  intensely  studied  since  the  very  first  days  following  the 
invention  of  calculus. 

In  this  text,  we  shall  concentrate  most  of  our  attention  on  the  very  simplest  case:  a 
homogeneous,  linear,  autonomous  dynamical  system,  in  which  the  right-hand  side  /( u)  is 
a  linear  function  of  u  that  is  independent  of  the  time  £,  and  hence  given  by  multiplication 
by  a  constant  matrix.  Thus,  the  system  takes  the  form 

du 

——  —  Au,  (8.9) 


in  which  A  is  a  constant  n  x  n  matrix.  Writing  out  the  system  (8.9)  in  full  detail  produces 


du1 

dt 

du2 

dt 


—  a11u1  +  a12u2  +  •••  +  alnunl 


—  a21ul  A  a22U2  A  +  a2  nUn-> 


(8.10) 


dun 

dt 


+  an2^2  +  '  *  '  +  annUni 
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T 

and  we  seek  the  solution  u (t)  =  ( iq(£), . . . ,  un(t) )  .  In  the  autonomous  case,  which  is  the 

only  type  to  be  treated  in  depth  here,  the  coefficients  atJ  are  assumed  to  be  (real)  constants. 
We  are  interested  not  only  in  the  formulas  for  the  solutions,  but  also  in  understanding  their 
qualitative  and  quantitative  behavior. 

Drawing  our  inspiration  from  the  exponential  solution  formula  (8.2)  for  the  scalar  ver¬ 
sion,  let  us  investigate  whether  the  vector  system  admits  any  solutions  of  a  similar  expo¬ 
nential  form 

u(t)  =  ext  v.  (8.11) 

We  assume  that  A  is  a  constant  scalar,  so  ext  is  the  usual  scalar  exponential  function,  while 
v  E  Mn  is  a  constant  vector.  In  other  words,  the  components  rq(t)  =  extvi  of  our  desired 
solution  are  assumed  to  be  constant  multiples  of  the  same  exponential  function.  Since  v 
is  constant,  the  derivative  of  u (t)  is  easily  found: 


du 

dt 


d 

dt 


v. 


On  the  other  hand,  since  ext 


is  a  scalar,  it  commutes  with  matrix  multiplication,  and  so 


Au  =  Aextw  =  ext  Aw. 


Therefore,  u (t)  will  solve  the  system  (8.9)  if  and  only  if 

AeAt  v  =  extAw1 

or,  canceling  the  common  scalar  factor  eAt, 

Av  =  Aw. 


The  result  is  a  system  of  n  algebraic  equations  relating  the  vector  v  and  the  scalar  A.  Anal¬ 
ysis  of  this  system  and  its  ramifications  will  be  the  topic  of  the  remainder  of  this  chapter. 
A  broad  range  of  significant  applications  will  appear  in  the  subsequent  two  chapters. 


8.2  Eigenvalues  and  Eigenvectors 

In  view  of  the  preceding  motivational  section,  we  hereby  inaugurate  our  discussion  of 
eigenvalues  and  eigenvectors  by  stating  the  basic  definition. 

Definition  8.2.  Let  A  be  an  n  x  n  matrix.  A  scalar  A  is  called  an  eigenvalue  of  A  if  there 
is  a  non-zero  vector  v^O,  called  an  eigenvector ,  such  that 

Av  =  Av.  (8.12) 


In  geometric  terms,  the  matrix  A  has  the  effect  of  stretching  the  eigenvector  v  by  an 
amount  specified  by  the  eigenvalue  A. 

Remark.  The  odd-looking  terms  “eigenvalue”  and  “eigenvector”  are  hybrid  German- 
English  words.  In  the  original  German,  they  are  Eigenwert  and  Eigenvector ,  which  can 
be  fully  translated  as  “proper  value”  and  “proper  vector”.  For  some  reason,  the  half- 
translated  terms  have  acquired  a  certain  charm,  and  are  now  standard.  The  alternative 
English  terms  characteristic  value  and  characteristic  vector  can  be  found  in  some  (mostly 
older)  texts.  Oddly,  the  terms  characteristic  polynomial  and  characteristic  equation ,  to  be 
defined  below,  are  still  used  rather  than  “eigenpolynomial”  and  “eigenequation” . 

The  requirement  that  the  eigenvector  v  be  nonzero  is  important,  since  v  =  0  is  a  trivial 
solution  to  the  eigenvalue  equation  (8.12)  for  every  scalar  A.  Moreover,  as  far  as  solving 
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linear  ordinary  differential  equations  goes,  the  zero  vector  v  =  0  gives  u (t)  =  0,  which  is 
certainly  a  solution,  but  one  that  we  already  knew. 

The  eigenvalue  equation  (8.12)  is  a  system  of  linear  equations  for  the  entries  of  the 
eigenvector  v  —  provided  that  the  eigenvalue  A  is  specified  in  advance  —  but  is  “mildly” 
nonlinear  as  a  combined  system  for  A  and  v.  Gaussian  Elimination  per  se  will  not  solve 
the  problem,  and  we  are  in  need  of  a  new  idea.  Let  us  begin  by  rewriting  the  equation  in 
the  fornA 

(A  —  AI)  v  =  0,  (8.13) 

where  I  is  the  identity  matrix  of  the  correct  size,  whereby  AIv  =  Av.  Now,  for  given 
A,  equation  (8.13)  is  a  homogeneous  linear  system  for  v,  and  always  has  the  trivial  zero 
solution  v  =  0.  But  we  are  specifically  seeking  a  nonzero  solution!  According  to  The¬ 
orem  1.47,  a  homogeneous  linear  system  has  a  nonzero  solution  v  A  0  if  and  only  if  its 
coefficient  matrix,  which  in  this  case  is  A  —  A I ,  is  singular.  This  observation  is  the  key  to 
resolving  the  eigenvector  equation. 

Theorem  8.3.  A  scalar  A  is  an  eigenvalue  of  the  n  x  n  matrix  A  if  and  only  if  the  matrix 
A  —  A I  is  singular,  i.e.,  of  rank  <  n.  The  corresponding  eigenvectors  are  the  nonzero 
solutions  to  the  eigenvalue  equation  (A  —  AI) v  =  0. 


We  know  a  number  of  ways  to  characterize  singular  matrices,  including  the  vanishing 
determinant  criterion  given  in  (1.84).  Therefore,  the  following  result  is  immediate: 


Corollary  8.4.  A  scalar  A  is  an  eigenvalue  of  the  matrix  A  if  and  only  if  A  is  a  solution 
to  the  characteristic  equation 

det  (A  —  AI)  =  0.  (8-14) 


In  practice,  when  computing  eigenvalues  and  eigenvectors  by  hand  using  exact  arith¬ 
metic,  one  first  solves  the  characteristic  equation  (8.14)  to  obtain  the  set  of  eigenvalues. 
Then,  for  each  eigenvalue  A  one  uses  standard  linear  algebra  methods,  e.g.,  Gaussian  Elim¬ 
ination,  to  solve  the  corresponding  linear  system  (8.13)  for  the  associated  eigenvector  v. 


Example  8.5.  Consider  the  2x2  matrix 


We  compute  the  determinant  in  the  characteristic  equation  using  formula  (1.38): 


det  ( A 


A  I )  =  det 


(3  -  A)2  -  1  =  A2  -  6  A  +  8. 


Thus,  the  characteristic  equation  is  a  quadratic  polynomial  equation,  and  can  be  solved  by 

factorization:  9  ,  A  x 

A2  -  6A  +  8  =  (A  -  4)  (A  -  2)  =  0. 


We  conclude  that  A  has  two  eigenvalues:  A:  =  4  and  A2  =  2. 

For  each  eigenvalue,  the  corresponding  eigenvectors  are  found  by  solving  the  associated 
homogeneous  linear  system  (8.13).  For  the  first  eigenvalue,  the  eigenvector  equation  is 


(A  —  4  I)  v 


or 


-x  +  y  =  0, 

x  —  y  —  0. 


^  Note  that  it  is  not  legal  to  write  (8.13)  in  the  form  ( A  —  A)v  =  0  since  we  do  not  know  how 
to  subtract  a  scalar  A  from  a  matrix  A.  Worse,  if  you  type  A  —  A  in  Matlab  or  Mathematica, 
the  result  will  be  to  subtract  A  from  all  the  entries  of  A,  which  is  not  what  we  are  after! 
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The  general  solution  is 


where  a  is  an  arbitrary  scalar.  Only  the  nonzero  solutions^  count  as  eigenvectors,  and  so 
the  eigenvectors  for  the  eigenvalue  A:  =  4  must  have  a  ^  0,  i.e.,  they  are  all  nonzero  scalar 

multiples  of  the  basic  eigenvector  v:  =  (1,1). 


Remark.  In  general,  if  v  is  an  eigenvector  of  A  for  the  eigenvalue  A,  then  so  is  every 
nonzero  scalar  multiple  of  v.  In  practice,  we  distinguish  only  linearly  independent  eigen- 

vectors.  Thus,  in  this  example,  we  shall  say  “v:  =  (1,1)  is  the  eigenvector  corresponding 
to  the  eigenvalue  A:  =  4”,  when  we  really  mean  that  the  set  of  eigenvectors  for  A:  =  4 
consists  of  all  nonzero  scalar  multiples  of  vx. 

Similarly,  for  the  second  eigenvalue  A2  =  2,  the  eigenvector  equation  is 

<-4-2i)v=0 

The  solution  (  —  a,  a)  =  a  (  —  1,1)  is  the  set  of  scalar  multiples  of  the  eigenvector 
v2  =  (  —  1, 1  )T  .  Therefore,  the  complete  list  of  eigenvalues  and  eigenvectors  (up  to  scalar 
multiple)  for  this  particular  matrix  is 


Ai  —  4, 


vi  = 


A2  —  2, 


V2  = 


1 

1 


Example  8.6.  Consider  the  3x3  matrix 


0  -1  -1 
A= 11  2  1 

1  1  2 

Using  the  formula  (1.88)  for  a  3  x  3  determinant,  we  compute  the  characteristic  equation 

/-A  -1  -1 

0  =  det  (A  —  A  I )  =  det  |  1  2  —  A  1 

\  1  1  2  -  A 

=  (— A)(2  -  A)2  +  (-1)  •  1  •  1  +  (-1)  -1-1 

-  1  •  (2  -  A)(— 1)  -  1  •  1  •  (- A)  -  (2  -  A)  •  1  •  (-1)  =  -  A3  +  4A2  -  5 A  +  2, 

The  resulting  cubic  polynomial  can  be  factored: 

-  A3  +  4  A2  -  5  A  +  2  =  -  (A  -  l)2  (A  -  2)  =  0. 


’  If,  at  this  stage,  you  end  up  with  a  linear  system  with  only  the  trivial  zero  solution,  you’ve  done 
something  wrong!  Either  you  don’t  have  a  correct  eigenvalue  —  maybe  you  made  a  mistake  setting 
up  and/or  solving  the  characteristic  equation  —  or  you’ve  made  an  error  solving  the  homogeneous 
eigenvector  system.  On  the  other  hand,  if  you  are  working  with  a  numerical  approximation  to 
the  eigenvalue,  then  the  resulting  numerical  homogeneous  linear  system  will  almost  certainly  not 
have  a  nonzero  solution,  and  therefore  a  completely  different  approach  must  be  taken  to  finding 
the  corresponding  eigenvector;  see  Sections  9.5  and  9.6. 
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Most  3x3  matrices  have  three  different  eigenvalues,  but  this  particular  one  has  only  two: 
X1  =  1,  which  is  called  a  double  eigenvalue ,  since  it  is  a  double  root  of  the  characteristic 
equation,  along  with  a  simple  eigenvalue  A2  =  2. 

The  eigenvector  equation  (8.13)  for  the  double  eigenvalue  =  1  is 


The  general  solution  to  this  homogeneous  linear  system 


depends  upon  two  free  variables:  y  =  a  and  z  =  b.  Every  nonzero  solution  forms  a  valid 
eigenvector  for  the  eigenvalue  Ax  =  1,  and  so  the  general  eigenvector  is  any  non-zero  linear 
combination  of  the  two  “basis  eigenvectors”  v1  =  (— 1,1,0),  v1  =  (  —  1,  0, 1  )T. 

On  the  other  hand,  the  eigenvector  equation  for  the  simple  eigenvalue  A2  =  2  is 


The  general  solution 


T 

consists  of  all  scalar  multiples  of  the  eigenvector  v2  =  (  —  1, 1, 1 )  . 

In  summary,  the  eigenvalues  and  (basis)  eigenvectors  for  this  matrix  are 


Ax  =  1, 

A2  —  2, 


(8.15) 


This  means  that  every  eigenvector  for  the  simple  eigenvalue  A2  =  2  is  a  nonzero  scalar 
multiple  of  v2,  while  every  eigenvector  for  the  double  eigenvalue  A:  =  1  is  a  nontrivial 
linear  combination  av1  +  6v1  of  the  two  linearly  independent  eigenvectors  v1?  vy. 


In  general,  given  a  real  eigenvalue  A,  the  corresponding  eigenspace  Vx  C  Mn  is  the 
subspace  spanned  by  all  its  eigenvectors.  Equivalently,  the  eigenspace  is  the  kernel 


yA  =  ker  (A  —  A  I ).  (8.16) 

Thus,  A  G  I  is  an  eigenvalue  if  and  only  if  Vx  ^  {0}  is  a  nontrivial  subspace,  and  then 
every  nonzero  element  of  is  a  corresponding  eigenvector.  The  most  economical  way  to 
indicate  each  eigenspace  is  by  writing  out  a  basis,  as  in  (8.15),  with  v1?  vy  giving  a  basis  for 
the  eigenspace  V1:  while  v2  is  a  basis  for  the  eigenspace  V2.  In  particular,  0  is  an  eigenvalue 
if  and  only  if  ker  AL  ^  {0},  and  hence  A  is  singular. 
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Proposition  8.7.  A  matrix  is  singular  if  and  only  if  it  has  a  zero  eigenvalue. 


1  2  l\ 

1  -1  1  is 

2  0  1/ 

0  =  det (A  —  A  I )  =  —  A3  +  A2  +  5A  +  3  =  —  (A  +  l)2  (A  —  3). 

Again,  there  is  a  double  eigenvalue  X±  =  —1  and  a  simple  eigenvalue  A2  =  3.  However,  in 
this  case  the  matrix 

(2  2  1\ 

A-  AXI  =  A+  I  =  1  0  1 

\2  0  2/ 

T 

has  a  one-dimensional  kernel,  spanned  by  v:  =  (2,  —1,  —2)  .  Thus,  even  though  A:  is  a 
double  eigenvalue,  it  admits  only  a  one-dimensional  eigenspace.  The  list  of  eigenvalues  and 
eigenvectors  is,  in  a  sense,  incomplete: 


Example  8.8.  The  characteristic  equation  of  the  matrix  A  = 


Ai  =  -1, 


Example  8.9.  Finally,  the  matrix  A  = 
0  =  det  (A  —  A  I )  =  —  A3  +  A2 


2 

1 

2 


has  characteristic  equation 


-  5  =  -  (A  +  1)  (A2  -  2  A  +  5). 


The  linear  factor  yields  the  eigenvalue  —1.  The  quadratic  factor  leads  to  two  complex 
roots,  1  +  2  i  and  1  —  2  i ,  which  can  be  obtained  via  the  quadratic  formula.  Hence  A  has 
one  real  and  two  complex  eigenvalues: 


At  =  -1, 


A2  =  1  +  2  i , 


A3  —  1  - 


On  solving  the  associated  linear  system  (A  +  I)  v  =  0,  the  real  eigenvalue  Ax  =  —  1  is  found 
to  have  corresponding  eigenvector  v1  =  (— 1,1,1)  . 

Complex  eigenvalues  are  as  important  as  real  eigenvalues,  and  we  need  to  be  able  to 
handle  them  too.  To  find  the  corresponding  eigenvectors,  which  will  also  be  complex,  we 
need  to  solve  the  usual  eigenvalue  equation  (8.13),  which  is  now  a  complex  homogeneous 
linear  system.  For  example,  the  eigenvectors  for  A2  =  1  +  2  i  are  found  by  solving 


A-  (1  +  2  i )  I 


v 


This  linear  system  can  be  solved  by  Gaussian  Elimination  (with  complex  pivots).  In  this 
case,  a  simpler  strategy  is  to  work  directly:  the  first  equation  —  2 ix  +  2 y  —  0  tells  us  that 
y  —  ix,  while  the  second  equation  —  2iy  —  2 z  —  0  says  z  —  —iy  —  x.  If  we  trust  our 
calculations  so  far,  we  do  not  need  to  solve  the  final  equation  2x  +  2?/  +  (— 2  — 2i)z  =  0, 
since  we  know  that  the  coefficient  matrix  is  singular  and  hence  this  equation  must  be 
a  consequence  of  the  first  two.  (However,  it  does  serve  as  a  useful  check  on  our  work.) 
So,  the  general  solution  v  =  (x,  ix,x)  is  an  arbitrary  constant  multiple  of  the  complex 
eigenvector  v2  =  ( 1,  i ,  1 )  .  The  eigenvector  equation  for  A3  =  1  —  2  i  is  similarly  solved 

T 

for  the  third  eigenvector  v3  =  ( 1,  —  i ,  1 )  . 
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Summarizing,  the  matrix  under  consideration  has  one  real  and  two  complex  eigenvalues, 
with  three  corresponding  eigenvectors,  each  unique  up  to  (complex)  scalar  multiple: 

Ai  =  — 1,  A2  =  1  +  2  i ,  A3  =  1  —  2  i , 


Note  that  the  third  complex  eigenvalue  is  the  complex  conjugate  of  the  second,  and  the 
eigenvectors  are  similarly  related.  This  is  indicative  of  a  general  fact  for  real  matrices: 

Proposition  8.10.  If  A  is  a  real  matrix  with  a  complex  eigenvalue  A  =  fi  +  iv  and 
corresponding  complex  eigenvector  v  =  x  +  iy,  then  the  complex  conjugate  A  —  fi—  i  v  is 
also  an  eigenvalue  with  complex  conjugate  eigenvector  v  =  x  —  iy. 

Proof :  First  take  complex  conjugates  of  the  eigenvalue  equation  (8.12): 

4v  =  Aw  =  Av  =  A  v. 

Using  the  fact  that  a  real  matrix  is  unaffected  by  complex  conjugation,  A  =  A,  we  conclude 
Av  =  Av,  which  is  the  equation  for  the  eigenvalue  A  and  eigenvector  v^O.  Q.E.D. 

As  a  consequence,  when  dealing  with  real  matrices,  we  need  to  compute  the  eigenvectors 
for  only  one  of  each  complex  conjugate  pair  of  eigenvalues.  This  observation  effectively 
halves  the  amount  of  work  in  the  unfortunate  event  that  we  are  confronted  with  complex 
eigenvalues. 

The  eigenspace  associated  with  a  complex  eigenvalue  A  is  the  subspace  Vx  C  Cn  spanned 
by  the  corresponding  (complex)  eigenvectors.  One  might  also  consider  complex  eigenvectors 
associated  with  a  real  eigenvalue,  but  this  doesn’t  add  anything  to  the  picture  —  they  are 
merely  complex  linear  combinations  of  the  real  eigenvectors.  Thus,  we  need  to  introduce 
complex  eigenvectors  only  when  dealing  with  genuinely  complex  eigenvalues. 

Remark.  The  reader  may  recall  that  we  said  that  one  should  never  use  determinants  in 
practical  computations.  So  why  have  we  reverted  to  using  determinants  to  find  eigenvalues? 
The  truthful  answer  is  that  the  practical  computation  of  eigenvalues  and  eigenvectors  never 
resorts  to  the  characteristic  equation!  The  method  is  fraught  with  numerical  traps  and 
inefficiencies  when  (a)  computing  the  determinant  leading  to  the  characteristic  equation, 
then  (b)  solving  the  resulting  polynomial  equation,  which  is  itself  a  nontrivial  numerical 
problem’*",  [8,  66],  and,  finally,  (c)  solving  each  of  the  resulting  linear  eigenvector  systems. 

Even  worse,  if  we  know  only  an  approximation  A  to  the  true  eigenvalue  A,  the  approximate 
eigenvector  system  (A  —  AI)v  =  0  will  almost  certainly  have  a  nonsingular  coefficient 
matrix,  and  hence  admits  only  the  trivial  solution  v  =  0  —  which  does  not  even  qualify 
as  an  eigenvector! 

Nevertheless,  the  characteristic  equation  does  give  us  important  theoretical  insight  into 
the  structure  of  the  eigenvalues  of  a  matrix,  and  can  be  used  when  dealing  with  very 
small  matrices,  e.g.,  2x2  and  3x3,  presuming  exact  arithmetic  is  employed.  Numerical 


1  In  fact,  one  effective  numerical  strategy  for  finding  the  roots  of  a  polynomial  is  to  turn  the 
procedure  on  its  head,  and  calculate  the  eigenvalues  of  a  matrix  whose  characteristic  equation  is 
the  polynomial  in  question!  See  [66]  for  details. 
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algorithms  for  computing  eigenvalues  and  eigenvectors  are  based  on  completely  different 
ideas,  and  will  be  deferred  until  Sections  9.5  and  9.6. 


Exercises 


8.2.1.  Find  the  eigenvalues  and  eigenvectors  of  the  following  matrices: 


M  (-2  ?)■  (0)  (;  f).  («>  (_1  !)•  M  (_!  ?) 
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cos  0  —  sin  0 
sin  0  cos  0 


8.2.2.  (a)  Find  the  eigenvalues  of  the  rotation  matrix  Rq  = 
of  0  are  the  eigenvalues  real? 

(b)  Explain  why  your  answer  gives  an  immediate  solution  to  Exercise  1.5.7c. 


For  what  values 


8.2.3.  Answer  Exercise  8.2.2a  for  the  reflection  matrix  Fq  = 


/  cos# 
y  sin  0 


T 

8.2.4.  Write  down  (a)  a  2  x  2  matrix  that  has  0  as  one  of  its  eigenvalues  and  (1,2)  as  a 


corresponding  eigenvector;  (b)  a  3  x  3  matrix  that  has  ( 1,  2,  3)T  as  an  eigenvector  for  the 
eigenvalue  —1. 


8.2.5.  (a)  Write  out  the  characteristic  equation  for  the  matrix 


/  0  1  0\ 

0  0  1. 

(3  7  / 

(b)  Show  that,  given  any  3  numbers  a,  6,  and  c,  there  is  a  3  x  3  matrix  with  characteristic 

o  o 

equation— A  +  aA  +  6A  +  c  =  0. 


8.2.6.  Find  the  eigenvalues  and  eigenvectors  of  the  cross  product  matrix  A  = 


8.2.7.  Find  all  eigenvalues  and  eigenvectors  of  the  following  complex  matrices: 

/ 1  +  i  -1-  i 
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(b) 


-  i  -2 


,  (c) 


i -2  i +1 
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1  —  i  \ 


2  -  i  2  —  2  i 
1+  i  — 1  +  2  i  / 


8.2.8.  Find  all  eigenvalues  and  eigenvectors  of 

(a)  the  n  x  n  zero  matrix  O;  (b)  the  n  x  n  identity  matrix  I . 


8.2.9.  Find  the  eigenvalues  and  eigenvectors  of  an  n  x  n  matrix  with  every  entry  equal  to  1. 
Hint :  Try  with  n  =  2,3,  and  then  generalize. 

0  8.2.10.  Let  A  be  a  given  square  matrix,  (a)  Explain  in  detail  why  every  nonzero  scalar 

multiple  of  an  eigenvector  of  A  is  also  an  eigenvector,  (b)  Show  that  every  nonzero  linear 
combination  of  two  eigenvectors  v,  w  corresponding  to  the  same  eigenvalue  is  also  an 
eigenvector,  (c)  Prove  that  a  linear  combination  cv  +  dw,  with  c,d  /  0,  of  two  eigenvectors 
corresponding  to  different  eigenvalues  is  never  an  eigenvector. 


0  8.2.11.  Let  A  be  a  real  eigenvalue  of  the  real  n  x  n  matrix  A,  and  v1? . . . ,  vfc  a  basis  for  the 

associated  eigenspace  Vx.  Suppose  w  G  Cn  is  a  complex  eigenvector,  so  Aw  =  Aw.  Prove 
that  w  =  <7  v1  +  •  •  •  +  ck  vfc  is  a  complex  linear  combination  of  the  real  eigenspace  basis. 
Hint :  Look  at  the  real  and  imaginary  parts  of  the  eigenvector  equation. 
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8.2.12.  True  or  false:  If  v  is  a  real  eigenvector  of  a  real  matrix  A,  then  a  nonzero  complex 
multiple  w  =  cv  for  c  G  C  is  a  complex  eigenvector  of  A. 


0  8.2.13.  Define  the  shift  map  S:Cn  — *  Cn  by  S(v^_,v 
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1  ’  Vn 
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2’  Ep 
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>«i) 
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(a)  Prove  that  S  is  a  linear  map,  and  write  down  its  matrix  representation  A. 

(b)  Prove  that  A  is  an  orthogonal  matrix,  (c)  Prove  that  the  sampled  exponential  vectors 
. . . ,  CJn_1  defined  in  (5.102)  form  an  eigenvector  basis  of  A.  What  are  the  eigenvalues? 


Basic  Properties  of  Eigenvalues 


If  A  is  an  n  x  n  matrix,  then  its  characteristic  polynomial  is  defined  to  be 

pA(X)  =  det  (A  —  A  I )  =  cn  Xn  +  cn_i  An  4-  •  •  •  +  cq  A  +  Cq.  (8.17) 

The  fact  that  pA( A)  is  a  polynomial  of  degree  n  is  a  consequence  of  the  general  determinan- 
tal  formula  (1.87).  Indeed,  every  term  is  prescribed  by  a  permutation  tt  of  the  rows  of  the 
matrix,  and  equals  plus  or  minus  a  product  of  n  distinct  matrix  entries  including  one  from 
each  row  and  one  from  each  column.  The  term  corresponding  to  the  identity  permutation 
is  obtained  by  multiplying  the  diagonal  entries  together,  which,  in  this  case,  is 

(an  -  (a22  -  ^)  •  •  •  (ann  -  =  (-l)nAn  +  (~l)n  1  ( an  +  a22  H  b  ann  )  Xn  1  4  . 

(8.18) 

All  of  the  other  terms  have  at  most  n  —  2  diagonal  factors  a-  —  A,  and  so  are  polynomials 
of  degree  <  n  —  2  in  A.  Thus,  (8.18)  is  the  only  summand  containing  the  monomials  An 
and  An_1,  and  so  their  respective  coefficients  are 

Cn  =(-!)">  Cn-1  =  (~1)n^1(all+a22+  ■■■  +ann)  =  (8-19) 

where  tr  A ,  the  sum  of  its  diagonal  entries,  is  called  the  trace  of  the  matrix  A.  The  other 
coefficients  cn_2, . . .  ,c1,c0  in  (8.17)  are  more  complicated  combinations  of  the  entries  of 

A.  However,  setting  A  =  0  implies 


Pa(0)  =  c0  —  det  A , 


(8.20) 


and  hence  the  constant  term  in  the  characteristic  polynomial  equals  the  determinant  of 

a  b 
c  d 


the  matrix.  In  particular,  if  A  — 
has  the  explicit  form 


is  a  2  x  2  matrix,  its  characteristic  polynomial 


pA  (A)  =  det  ( A  —  A  I )  =  det 


a  —  A 


d- A 


(8.21) 


=  A2  —  (a  +  d)  A  +  (ad  —  be)  =  A2  —  (tr  A)X  +  (det  A). 

According  to  the  Fundamental  Theorem  of  Algebra,  [26],  every  (complex)  polynomial  of 
degree  n  >  1  can  be  completely  factored,  and  so  we  can  write  the  characteristic  polynomial 
(8.17)  in  factored  form: 

PaW  =  (— 1)"(A  -  V)  (A  -  A2)  ■  ■  •  (A  -  AJ.  (8.22) 


The  complex  numbers  A1? . . . ,  An,  some  of  which  may  be  repeated,  are  the  roots  of  the 
characteristic  equation  pA( A)  =  0,  and  hence  the  eigenvalues  of  the  matrix  A.  Therefore, 
we  immediately  conclude: 


Theorem  8.11.  An  n  x  n  matrix  possesses  at  least  one  and  at  most  n  distinct  complex 
eigenvalues. 
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Most  nxn  matrices  —  meaning  those  for  which  the  characteristic  polynomial  factors  into 
n  distinct  factors  —  have  exactly  n  complex  eigenvalues.  More  generally,  an  eigenvalue 
A  -  is  said  to  have  multiplicity  k  if  the  factor  (A  —  A  •)  appears  exactly  k  times  in  the 
factorization  (8.22)  of  the  characteristic  polynomial.  An  eigenvalue  is  simple  if  it  has 
multiplicity  1,  double  if  it  has  multiplicity  2,  and  so  on.  In  particular,  A  has  n  distinct 
eigenvalues  if  and  only  if  all  its  eigenvalues  are  simple.  In  all  cases,  when  the  repeated 
eigenvalues  are  counted  in  accordance  with  their  multiplicity,  every  nxn  matrix  has  a 
total  of  n  complex  eigenvalues. 

An  example  of  a  matrix  with  just  one  eigenvalue,  of  multiplicity  n,  is  the  nxn  identity 
matrix  I,  whose  only  eigenvalue  is  A  =  1.  In  this  case,  every  nonzero  vector  in  Mn  is  an 
eigenvector  of  the  identity  matrix  (why?),  and  so  the  eigenspace  V1  is  all  of  Mn.  At  the 
other  extreme,  the  nxn  “bidiagonal”  Jordan  block  matrix ^ 


a 


^ a,n 


1 

a  1 
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\ 


\ 


a  1 
a  ) 


(8.23) 


also  has  only  one  eigenvalue,  A  =  a,  again  of  multiplicity  n.  But  in  this  case,  Ja  n  has  only 
one  eigenvector  (up  to  scalar  multiple),  which  is  the  first  standard  basis  vector  el5  and  so 
its  eigenspace  is  one-dimensional. 

Remark.  If  A  is  a  complex  eigenvalue  of  multiplicity  k  for  the  real  matrix  A,  then  its 
complex  conjugate  A  also  has  multiplicity  k.  This  is  because  complex  conjugate  roots  of  a 
real  polynomial  necessarily  appear  with  identical  multiplicities. 


Remark.  If  n  <  4,  then  one  can,  in  fact,  write  down  an  explicit  formula  for  the  solution 
to  a  polynomial  equation  of  degree  n,  and  hence  explicit  (but  rather  complicated  and  not 
particularly  helpful)  formulas  for  the  eigenvalues  of  general  2x2,  3x3,  and  4x4  matrices. 
As  soon  as  n  >  5,  there  is  no  explicit  formula  (at  least  in  terms  of  radicals),  and  so  one 
must  usually  resort  to  numerical  approximations.  This  remarkable  and  deep  algebraic 
result  was  proved  in  the  early  nineteenth  century  by  the  young  Norwegian  mathematician 
Niels  Henrik  Abel,  [26]. 


Proposition  8.12.  A  square  matrix  A  and  its  transpose  AT  have  the  same  characteristic 
equation,  and  hence  the  same  eigenvalues  with  the  same  multiplicities. 

Proof :  This  follows  immediately  from  Proposition  1.56,  that  the  determinant  of  a  matrix 
and  its  transpose  are  identical.  Thus, 

pA( A)  =  det  (A  —  A  I )  =  det  (A  —  A  I  )T  =  det  ( AT  —  A  I )  =  pAT  (A).  Q.E.D. 

Remark.  While  AT  has  the  same  eigenvalues  as  A,  its  eigenvectors  are,  in  general,  dif¬ 
ferent.  An  eigenvector  v  of  A,  satisfying  ATv  =  Av,  is  sometimes  referred  to  as  a  left 
eigenvector  of  A,  since  it  satisfies  vTA  =  AvT.  A  more  apt,  albeit  rather  non-convent ional, 
name  for  v  that  conforms  with  our  nomenclature  conventions  would  be  co- eigenvector . 


All  non-displayed  entries  are  zero. 
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If  we  explicitly  multiply  out  the  factored  product  (8.22)  and  equate  the  result  to  the 
characteristic  polynomial  (8.17),  we  find  that  its  coefficients  c0,  c1? . . .  cn_1  can  be  written 
as  certain  polynomials  of  the  roots,  known  as  the  elementary  symmetric  polynomials .  The 
first  and  last  are  of  particular  importance: 

c0  =  Ai  A2  •••  An,  cn_1  =  (-l)n  1(A1+A2+  •••  +An).  (8.24) 

Comparison  with  our  previous  formulas  (8.19,  20)  for  the  coefficients  c0  and  cn_1  leads  to 
the  following  useful  result. 

Proposition  8.13.  The  sum  of  the  eigenvalues  of  a  square  matrix  A  equals  its  trace: 

Ai  T  A2  T  •••  +  An  =  tr  A  =  an  +  a22  +  •••  +  ann.  (8.25) 

The  product  of  the  eigenvalues  equals  its  determinant: 

Ai  A2  •••  Xn  —  det  A.  (8.26) 

Keep  in  mind  that,  in  evaluating  (8.25,26),  one  must  add  or  multiply  repeated  eigen¬ 
values  according  to  their  multiplicity. 

Example  8.14.  The  matrix  A  = 
and  determinant 

tr  A  =  1,  det  A  =  3, 

which  fix,  respectively,  the  coefficient  of  A2  and  the  constant  term  in  its  characteristic 
equation.  This  matrix  has  two  distinct  eigenvalues:  —1,  which  is  a  double  eigenvalue,  and 
3,  which  is  simple.  For  this  particular  matrix,  (8.25,  26)  become 

1  =  tr  A=  (-1)  +  (-1)  +  3,  3  =  det  A  =  (— 1)(— 1)  3. 

Note  that  the  double  eigenvalue  contributes  twice  to  both  the  sum  and  the  product. 


considered  in  Example  8.8  has  trace 


Exercises 


8.2.14.  (a)  Compute  the  eigenvalues  and  corresponding  eigenvectors  of  A 


/ 1 

4 

4\ 

3 

-1 

0  . 

Vo 

2 

3/ 

(b)  Compute  the  trace  of  A  and  check  that  it  equals  the  sum  of  the  eigenvalues,  (c)  Find 
the  determinant  of  A  and  check  that  it  is  equal  to  to  the  product  of  the  eigenvalues. 


8.2.15.  Verify  the  trace  and  determinant  formulas  (8.25,  26)  for  the  matrices  in  Exercise  8.2.1. 

8.2.16.  (a)  Find  the  explicit  formula  for  the  characteristic  polynomial 

Q  Q 

det  (A  —  AI)  =  —  A+aA  — 6A  +  cofa  general  3x3  matrix.  Verify  that  a  =  tr  A, 
c  =  det  A.  What  is  the  formula  for  6?  (b)  Prove  that  if  A  has  eigenvalues  A1?  A2,  A3,  then 
a  =  tr  A  =  A:  4-  A2  4-  A^,  b  =  A^  A2  4~  A^  A^  4~  A2  A^,  c  =  det  A  =  A^  A2  A^. 

8.2.17.  Prove  that  the  eigenvalues  of  an  upper  triangular  (or  lower  triangular)  matrix  are  its 
diagonal  entries. 


0  8.2.18.  Let  J a  n  be  the  n  x  n  Jordan  block  matrix  (8.23).  Prove  that  its  only  eigenvalue  is 
A  =  a  and  the  only  eigenvectors  are  the  nonzero  scalar  multiples  of  the  standard  basis 
vector  e1. 
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0  8.2.19.  Suppose  that  A  is  an  eigenvalue  of  A.  (a)  Prove  that  c  A  is  an  eigenvalue  of  the  scalar 

multiple  cA.  ( b )  Prove  that  A  +  d  is  an  eigenvalue  of  A  +  d  I .  (c)  More  generally,  cA  +  d  is 
an  eigenvalue  of  B  =  cA-\-dl  for  scalars  c,  d. 

o  r\ 

8.2.20.  Show  that  if  A  is  an  eigenvalue  of  A,  then  A  is  an  eigenvalue  of  A  . 

8.2.21.  True  or  false:  (a)  If  A  is  an  eigenvalue  of  both  A  and  B ,  then  it  is  an  eigenvalue  of  the 
sum  A  +  B.  (b)  If  v  is  an  eigenvector  of  both  A  and  B ,  then  it  is  an  eigenvector  of  A  +  B. 

8.2.22.  True  or  false:  If  A  is  an  eigenvalue  of  A  and  fi  is  an  eigenvalue  of  B ,  then  A /i  is  an 
eigenvalue  of  the  matrix  product  C  =  AB. 

0  8.2.23.  Let  A  and  B  be  n  x  n  matrices.  Prove  that  the  matrix  products  AB  and  BA  have  the 
same  eigenvalues.  Hint:  How  should  the  eigenvectors  be  related? 

0  8.2.24.  (a)  Prove  that  if  A  /  0  is  a  nonzero  eigenvalue  of  the  nonsingular  matrix  A,  then  1/A  is 
an  eigenvalue  of  A-1,  (b)  What  happens  if  A  has  0  as  an  eigenvalue? 

0  8.2.25.  (a)  Prove  that  if  |  det  A  |  >  1,  then  A  has  at  least  one  eigenvalue  with  |  A  |  >  1. 

(b)  If  |  det  A  |  <  1,  are  all  eigenvalues  |  A  |  <  1?  Prove  or  find  a  counterexample. 

8.2.26.  Prove  that  A  is  a  singular  matrix  if  and  only  if  0  is  an  eigenvalue. 

8.2.27.  Prove  that  every  nonzero  vector  0  / vG  Mn  is  an  eigenvector  of  A  if  and  only  if  A  is  a 
scalar  multiple  of  the  identity  matrix. 

8.2.28.  How  many  unit  (norm  1)  eigenvectors  correspond  to  a  given  eigenvalue  of  a  matrix? 

8.2.29.  True  or  false:  (a)  Performing  an  elementary  row  operation  of  type  #1  does  not  change 
the  eigenvalues  of  a  matrix,  (b)  Interchanging  two  rows  of  a  matrix  changes  the  sign 

of  its  eigenvalues,  (c)  Multiplying  one  row  of  a  matrix  by  a  scalar  multiplies  one  of  its 
eigenvalues  by  the  same  scalar. 

8.2.30.  (a)  True  or  false:  If  A1,v1  and  A2,v2  solve  the  eigenvalue  equation  (8.12)  for  a  given 

matrix  A,  so  does  X1  +  A2,  v1  +  v2.  (b)  Explain  what  this  has  to  do  with  linearity. 

m 

8.2.31.  As  in  (4.35),  an  elementary  reflection  matrix  has  the  form  Q  =  I  —  2uu  ,  where 
u  £  Mn  is  a  unit  vector,  (a)  Find  the  eigenvalues  and  eigenvectors  of  the  elementary 


reflection  matrices  for  the  unit  vectors  (z) 


1 

0 


(m) 


(Hi) 


(  0\ 

1 

W 


(w) 


4.  \ 

V2 

0 

1 

'  y/2  ) 


\ 


(b)  What  are  the  eigenvalues  and  eigenvectors  of  a  general  elementary  reflection  matrix? 

0  8.2.32.  Let  A  and  B  be  similar  matrices,  so  B  =  S~1AS  for  some  nonsingular  matrix  S. 

(a)  Prove  that  A  and  B  have  the  same  characteristic  polynomial:  pB( A)  =  pA( A). 

(b)  Explain  why  similar  matrices  have  the  same  eigenvalues,  (c)  Do  they  have  the  same 

eigenvectors?  If  not,  how  are  their  eigenvectors  related?  (d)  Prove  that  the  converse  to  part 

2  0\  ,  /  1  1 

I  nnrl  I 

-1  3 


(c)  is  false  by  showing  that 
similar. 


0  2 


have  the  same  eigenvalues,  but  are  not 


8.2.33.  Let  A  be  a  nonsingular  n  x  n  matrix  with  characteristic  polynomial  pA( A). 

(a)  Explain  how  to  construct  the  characteristic  polynomial  pA- 1  (A)  of  its  inverse  directly 


from  pA( A),  (b)  Check  your  result  when  A  =  (i)  ^ 


1  2 
3  4 


\ 

( 

1 

4 

A 

)»  (**) 

-2 

-1 

0 
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K 

0 

2 

3; 

0  8.2.34.  Prove  that  the  only  eigenvalue  of  a  nilpotent  matrix,  cf.  Exercise  1.3.12,  is  0. 
(The  converse  is  also  true;  see  Exercise  8.6.20.) 
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8.2.35.  Given  an  idempotent  matrix,  so  that  P  =  P2 ,  find  all  its  eigenvalues  and  eigenvectors. 

8.2.36.  (a)  Prove  that  every  real  3x3  matrix  has  at  least  one  real  eigenvalue. 

(b)  Find  a  real  4x4  matrix  with  no  real  eigenvalues. 

(c)  Can  you  find  a  real  5x5  matrix  with  no  real  eigenvalues? 

8.2.37.  (a)  Show  that  if  A  is  a  matrix  such  that  A4  =  I ,  then  the  only  possible  eigenvalues  of 

A  are  1,-1,  i ,  and  —  i . 

(b)  Give  an  example  of  a  real  matrix  that  has  all  four  numbers  as  eigenvalues. 

8.2.38.  (a)  Prove  that  if  A  is  an  eigenvalue  of  A,  then  An  is  an  eigenvalue  of  An .  (b)  State  and 

prove  a  converse. 

8.2.39.  True  or  false:  All  the  eigenvalues  of  an  n  x  n  permutation  matrix  are  real. 

8.2.40.  (a)  Show  that  if  all  the  row  sums  of  A  are  equal  to  1,  then  A  has  1  as  an  eigenvalue. 

(b)  Suppose  all  the  column  sums  of  A  are  equal  to  1.  Does  the  same  result  hold? 


0  8.2.41.  Prove  that  if  v  is  an  eigenvector  of  A  with  eigenvalue  A  and  w  is  an  eigenvector  of  AT 
with  a  different  eigenvalue  /i  /  A,  then  v  and  w  are  orthogonal  vectors  with  respect  to  the 


dot  product.  Illustrate  this  result  when  (i)  A 


(ii)  A 


(  5 
5 

V-2 


8.2.42.  Let  Q  be  an  orthogonal  matrix,  (a)  Prove  that  if  A  is  an  eigenvalue,  then  so  is  1/A. 

(b)  Prove  that  all  its  eigenvalues  are  complex  numbers  of  modulus  |  A  |  =  1.  In  particular, 
the  only  possible  real  eigenvalues  of  an  orthogonal  matrix  are  d=l.  (c)  Suppose  v  =  x  +  iy 
is  a  complex  eigenvector  corresponding  to  a  non-real  eigenvalue.  Prove  that  its  real  and 
imaginary  parts  are  orthogonal  vectors  having  the  same  Euclidean  norm. 


0  8.2.43.  (a)  Prove  that  every  3x3  proper  orthogonal  matrix  has  +1  as  an  eigenvalue. 

(b)  True  or  false:  An  improper  3x3  orthogonal  matrix  has  —1  as  an  eigenvalue. 

0  8.2.44.  (a)  Show  that  the  linear  transformation  defined  by  a  3  x  3  proper  orthogonal  matrix 


corresponds  to  rotating  through  an  angle  around  a  line  through  the  origin  in 
of  the  rotation.  Hint:  Use  Exercise  8.2.43(a).  / 


the  axis 


(b)  Find  the  axis  and  angle  of  rotation  of  the  orthogonal  matrix 
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0  8.2.45.  Prove  that  every  proper  affine  isometry  F(x)  =  QxT  b  of  R  ,  where  det  Q  =  +1,  is  one 
of  the  following:  (a)  a  translation  x  +  b,  (b)  a  rotation  centered  at  some  point  of  R  ,  or 
(c)  a  screw  motion  consisting  of  a  rotation  around  an  axis  followed  by  a  translation  in  the 
direction  of  the  axis.  Hint:  Use  Exercise  8.2.44. 


8.2.46.  Suppose  Q  is  an  orthogonal  matrix,  (a)  Prove  that  K  =  2  I 
semi-definite  matrix,  (b)  Under  what  conditions  is  K  >  0? 


Q 


is  a  positive 


U  8.2.47.  Let  Mn  be  the  n  x  n  tridiagonal  matrix  whose  diagonal  entries  are  all  equal  to  0 
and  whose  sub-  and  super-diagonal  entries  all  equal  1.  (a)  Find  the  eigenvalues  and 

eigenvectors  of  M2  and  M3  directly,  (b)  Prove  that  the  eigenvalues  and  eigenvectors  of  M, 
are  explicitly  given  by 


n 


Xk  =  2  COS 


k  7T 

n  +  1 


k 


sm 


k  7T 

n  +  1 


sm 


2kn 
n  +  1 


sm 


nk  tt 
n  +  1 


T 


k  =  1, 


n. 


How  do  you  know  that  there  are  no  other  eigenvalues? 


0  8.2.48.  Let  a,  b  £  R.  Determine  the  eigenvalues  and  eigenvectors  of  the  n  x  n  tridiagonal 

matrix  with  all  diagonal  entries  equal  to  a  and  all  sub-  and  super-diagonal  entries  equal  to 
b.  Hint:  See  Exercises  8.2.19  and  8.2.47. 
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22  8.2.49.  Find  a  formula  for  the  eigenvalues  of  the  tricirculant  n  x  n  matrix  Zn  that  has  l’s  on 
the  sub-  and  super-diagonals  as  well  as  its  (1 ,  n)  and  (n,  1)  entries,  while  all  other  entries 
are  0.  Hint :  Use  Exercise  8.2.47  as  a  guide. 


0  8.2.50.  Let  A  be  an  n  x  n  matrix  with  eigenvalues  A1? . . . ,  Afc,  and  B  an  m  x  m  matrix  with 
eigenvalues  /xl5 . . . ,  /q.  Show  that  the  (m+ri)  x  (m+ri)  block  diagonal  matrix  D  = 
has  eigenvalues  A1? . . . ,  Afc,  /i1? . . . ,  /q  and  no  others.  How  are  the  eigenvectors  related? 


22  8.2.51.  Deflation:  Suppose  A  has  eigenvalue  A  and  corresponding  eigenvector  v.  (a)  Let  b  be 

'T' 

any  vector.  Prove  that  the  matrix  5  =  4-  vb  also  has  v  as  an  eigenvector,  now  with 
eigenvalue  A  —  /?,  where  f3  =  v  •  b.  (b)  Prove  that  if  fi  ^  A  —  f3  is  any  other  eigenvalue  of  A, 
then  it  is  also  an  eigenvalue  of  B.  Hint:  Look  for  an  eigenvector  of  the  form  w  +  cv,  where 
w  is  an  eigenvector  of  A.  (c)  Given  a  nonsingular  matrix  A  with  eigenvalues  Al5  A2, . . . ,  An 
and  X1  7^  A  •  for  all  j  >  2,  explain  how  to  construct  a  deflated  matrix  B  whose  eigenvalues 


are  0,  A2, 


. ,  An.  (d)  Try  out  your  method  on  the  matrices 


22  8.2.52.  Let  A  = 


a 

c 


be  a  2  x  2  matrix,  (a)  Prove  that  A  satisfies  its  own  characteristic 

equation,  meaning  pA( A)  =  A  —  (tv  A)  A  +  (det  A)  l  =  O.  Remark.  This  result  is  a 
special  case  of  the  Cayley-Hamilton  Theorem ,  to  be  developed  in  Exercise  8.6.22. 

f b)  Prove  the  inverse  formula  A  1  =  I  — A  wpen  o. 

v  ’  det  A  , 

(21 

(c)  Check  the  Cayley-Hamilton  and  inverse  formulas  when  A  =  (  ^  ^ 


The  Gershgorin  Circle  Theorem 

In  general,  precisely  computing  the  eigenvalues  of  a  matrix  is  not  easy,  and,  in  most 
cases,  must  be  done  through  a  numerical  eigenvalue  procedure;  see  Sections  9.5  and  9.6. 
In  certain  applications,  though,  we  may  not  require  exact  numerical  values,  but  only  their 
approximate  locations.  The  Gershgorin  Circle  Theorem ,  due  to  the  early-twentieth-century 
Russian  mathematician  Semyon  Gershgorin,  serves  to  restrict  the  eigenvalues  to  a  certain 
well-defined  region  in  the  complex  plane. 


Definition  8.15.  Let  A  be  an  n  x  n  matrix,  either  real  or  complex.  For  each  1  <  i  <  n, 
define  the  ith  Gershgorin  disk 


Di  =  {  1 2  -  au  <ri  ^  e  C } , 

n 

where  ri  —  |  a-  . 

•  -1 

(8.27) 

n 

3  = 1 

jAi 

The  Gershgorin  domain  D A  =  IJ  Di  C  C  is 

the  union  of  the  Gershgorin  disks. 

i—  1 

Thus,  the  zth  Gershgorin  disk  Di  is  centered  at  the  ith  diagonal  entry  au  of  A,  and  has 
radius  ri  equal  to  the  sum  of  the  absolute  values  of  the  off-diagonal  entries  that  are  in  its 
Uh  row.  We  can  now  state  the  Gershgorin  Circle  Theorem. 

Theorem  8.16.  All  real  and  complex  eigenvalues  of  the  matrix  A  lie  in  its  Gershgorin 
domain  D A  C  C. 
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Figure  8.2.  Gershgorin  Disks  and  Eigenvalues. 


Example  8.17.  The  matrix  A  = 


2-1  0 

1  4  —1  |  has  Gershgorin  disks 

1  -1  -3 


D1  =  {\z-  2  |  <  1 }  ,  D2  —  {  \  z  —  A  \  <  2}  ,  Ds  =  {  \  z  +  3  \  <  2  } , 


which  are  plotted  in  Figure  8.2.  The  eigenvalues  of  A  are 

Aj=3,  A2  =  VlO  =  3.1622...  ,  A3  =  - \/l0  = -3.1622  ...  . 

Observe  that  A:  belongs  to  both  D1  and  D2,  while  A2  lies  in  D2,  and  A3  is  in  Ds.  We  thus 
confirm  that  all  three  eigenvalues  are  in  the  Gershgorin  domain  D A  —  Dx  U  D2  U  D3. 


Proof  of  Theorem  8.16 :  Let  v  be  an  eigenvector  of  A  with  eigenvalue  A.  Let  u 
be  the  corresponding  unit  eigenvector  with  respect  to  the  oc  norm,  so 


oo 


U 


oo 


=  max 


{ 


u 


u 


n 


}  =  i 


Let  ui  be  an  entry  of  u  that  achieves  the  maximum:  |  ui 
component  of  the  eigenvalue  equation  A u  =  Au,  we  obtain 


=  1.  Writing  out  the  ith 


n 

Ea-  •  u-  =  Xu-,  which  we  rewrite  as 

lj  j 

3  =  1 


X2  %  Uj  =  (X~  “ii)  Ui' 


Therefore,  since  all 


<  1,  while 


A 


a 


n 


A 


a 


ll 


ur 


=  \(x~aii)ui 


This  immediately  implies  that  A  E  Di  C  D A  belongs  to  the  zth  Gershgorin  disk.  Q.E.D. 


According  to  Proposition  8.7,  a  matrix  A  is  singular  if  and  only  if  it  admits  zero  as  an 
eigenvalue.  Thus,  if  its  Gershgorin  domain  does  not  contain  0,  it  cannot  be  an  eigenvalue, 
and  hence  A  is  necessarily  invertible.  The  condition  0  0  D A  requires  that  the  matrix  have 
large  diagonal  entries,  as  quantified  by  the  following  definition. 


Definition  8.18.  A  square  matrix  A  is  called  strictly  diagonally  dominant  if 


au 


>  £ 


aH 


for  all 


i  —  1, . . . ,  n. 


(8.28) 
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In  other  words,  strict  diagonal  dominance  requires  each  diagonal  entry  to  be  larger,  in 
absolute  value,  than  the  sum  of  the  absolute  values  of  all  the  other  entries  in  its  row.  For 

3  -1  1 

example,  the  matrix  |  1  —4  2  |  is  strictly  diagonally  dominant  since 

-2  -1  5 


3  >  -1  +  1 


4  >  1  +  2 


5  >  -2  + 


1 


Diagonally  dominant  matrices  appear  frequently  in  numerical  solution  methods  for  both 
ordinary  and  partial  differential  equations. 

Theorem  8.19.  A  strictly  diagonally  dominant  matrix  is  nonsingular. 


Proof :  The  diagonal  dominance  inequalities  (8.28)  imply  that  the  radius  of  the  zth  Gersh- 
gorin  disk  is  strictly  less  than  the  modulus  of  its  center:  ri  <  |  ai{  |.  This  implies  that  the 
disk  cannot  contain  0;  indeed,  if  z  E  Di:  then,  by  the  triangle  inequality, 


r  > 
% 


z~au 


> 


Z 


>  r  — 
% 


z 


and  hence 


z 


>  0. 


Thus,  0  does  not  he  in  the  Gershgorin  domain  DAl  and  so  cannot  be  an  eigenvalue.  Q.E.D. 


Warning.  The  converse  to  this  result  is  obviously  not  true;  there  are  plenty  of  nonsingular 
matrices  that  are  not  strictly  diagonally  dominant. 


Exercises 


8.2.53.  For  each  of  the  following  matrices, 

(z)  find  all  Gershgorin  disks;  (ii)  plot  the  Gershgorin  domain  in  the  complex  plane; 
(Hi)  compute  the  eigenvalues  and  confirm  the  truth  of  the  Circle  Theorem  8.16: 
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8.2.54.  True  or  false:  The  Gershgorin  domain  of  the  transpose  of  a  matrix  AT  is  the  same  as 
the  Gershgorin  domain  of  the  matrix  A,  that  is,  Dat  =  DA. 

0  8.2.55. (z)  Explain  why  the  eigenvalues  of  A  must  lie  in  its  refined  Gershgorin  domain 


Da  =  Dat  PI  D A.  (ii)  Find  the  refined  Gershgorin  domains  for  each  of  the  matrices  in 
Exercise  8.2.53  and  confirm  the  result  in  part  (z). 

8.2.56.  True  or  false:  (a)  A  positive  definite  matrix  is  strictly  diagonally  dominant. 

(b)  A  strictly  diagonally  dominant  matrix  is  positive  definite. 

0  8.2.57.  Prove  that  if  K  is  symmetric,  strictly  diagonally  dominant,  and  each  diagonal  entry  is 
positive,  then  K  is  positive  definite. 

8.2.58.  (a)  Write  down  an  invertible  matrix  A  whose  Gershgorin  domain  contains  0. 

(b)  Can  you  find  an  example  that  is  also  strictly  diagonally  dominant? 
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8.3  Eigenvector  Bases 

Most  of  the  vector  space  bases  that  play  a  distinguished  role  in  applications  are  assembled 
from  the  eigenvectors  of  a  particular  matrix.  In  this  section,  we  show  that  the  eigenvectors 
of  a  “complete”  matrix  automatically  form  a  basis  for  Mn,  or,  in  the  complex  case,  Cn.  In 
the  following  subsection,  we  use  the  eigenvector  basis  to  rewrite  the  linear  transformation 
determined  by  the  matrix  in  a  simple  diagonal  form,  hence  the  alternative  more  common 
term  “diagonalizable”  for  such  matrices.  The  most  important  cases  —  symmetric  and 
positive  definite  matrices  —  will  be  treated  in  the  following  section. 

The  first  task  is  to  show  that  eigenvectors  corresponding  to  distinct  eigenvalues  are 
automatically  linearly  independent. 

Lemma  8.20.  If  A1? . . . ,  Xk  are  distinct  eigenvalues  of  a  matrix  A ,  so  Xi  A  •  when  i  j, 
then  the  corresponding  eigenvectors  v1? . . . ,  vfc  are  linearly  independent. 

Proof :  The  result  is  proved  by  induction  on  the  number  of  eigenvalues.  The  case  k  =  1  is 
immediate,  since  an  eigenvector  cannot  be  zero.  Assume  that  we  know  that  the  result  is 
valid  for  k  —  1  eigenvalues.  Suppose  we  have  a  vanishing  linear  combination: 

clVl  +  •••  +cfc_1vfc_1 +cfevfc  =  0.  (8.29) 

Let  us  multiply  this  equation  by  the  matrix  A: 

A(C1V1+  +V1V1  +cfevfe)  =  c1Av1  +  •••  +ck_1Avk_1  +ckAvk 

=  ciAivi  +  •••  +cfe_iAfe_iVfe_i  +cfeAfevfe  =  0. 

On  the  other  hand,  if  we  multiply  the  original  equation  (8.29)  by  Afc,  we  also  have 

ciA/cvi+  •••  +cfc-iAfevfe-i  +  cfeAfevfe  =  °- 

On  subtracting  this  from  the  previous  equation,  the  final  terms  cancel,  and  we  are  left  with 
the  equation 

ci(Ai  -  Afc)v!  +  •  •  •  +  cfc_!( Afe_x  -  Afc)vfe_1  =  0. 

This  is  a  vanishing  linear  combination  of  the  first  k—1  eigenvectors,  and  so,  by  our  induction 
hypothesis,  can  happen  only  if  all  the  coefficients  are  zero: 

Ci(Ai  —  Xk)  =  0,  ...  ck_1(Xk_1  —  Xk)  —  0. 

The  eigenvalues  were  assumed  to  be  distinct,  and  consequently  c1  =  •  •  •  =  ck_1  =  0. 
Substituting  these  values  back  into  (8.29),  we  find  that  ckvk  =  0,  and  so  ck  =  0  also, 
since  the  eigenvector  vk  0.  Thus  we  have  proved  that  (8.29)  holds  if  and  only  if  c1  = 
•  •  •  =  ck  —  0,  which  implies  the  linear  independence  of  the  eigenvectors  v1? . . . ,  vfc.  This 
completes  the  induction  step.  Q.E.D. 

The  most  important  consequence  of  this  result  concerns  when  a  matrix  has  the  maximum 
allotment  of  eigenvalues. 

Theorem  8.21.  If  the  n  x  n  real  matrix  A  has  n  distinct  real  eigenvalues  A1,...,An, 
then  the  corresponding  real  eigenvectors  v1? . . . ,  vn  form  a  basis  of  Mn.  If  A  (which  may 
now  be  either  a  real  or  a  complex  matrix)  has  n  distinct  complex  eigenvalues,  then  the 
corresponding  eigenvectors  v1? . . . ,  vn  form  a  basis  of  Cn. 

For  instance,  the  2x2  matrix  in  Example  8.5  has  two  distinct  real  eigenvalues,  and  its 
two  independent  eigenvectors  form  a  basis  of  M2.  The  3x3  matrix  in  Example  8.9  has 
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three  distinct  complex  eigenvalues,  and  its  eigenvectors  form  a  basis  for  C3.  If  a  matrix 
has  multiple  eigenvalues,  then  there  may  or  may  not  be  an  eigenvector  basis  of  Mn  (or  Cn). 
The  matrix  in  Example  8.6  admits  an  eigenvector  basis,  whereas  the  matrix  in  Example  8.8 
does  not.  In  general,  it  can  be  proved^  that  the  dimension  of  an  eigenspace  is  less  than  or 
equal  to  the  corresponding  eigenvalue’s  multiplicity.  In  particular,  every  simple  eigenvalue 
has  a  one-dimensional  eigenspace,  and  hence,  up  to  scalar  multiple,  only  one  associated 
eigenvector. 

Definition  8.22.  An  eigenvalue  A  of  a  matrix  A  is  called  complete  if  the  corresponding 
eigenspace  Vx  =  ker  ( A  —  A  I )  has  the  same  dimension  as  its  multiplicity.  The  matrix  A  is 
said  to  be  complete  if  all  its  eigenvalues  are  complete. 

Note  that  a  simple  eigenvalue  is  automatically  complete,  since  its  eigenspace  is  the  one¬ 
dimensional  subspace  or  eigenline  spanned  by  the  corresponding  eigenvector.  Thus,  only 
multiple  eigenvalues  can  cause  a  matrix  to  be  incomplete. 

Remark.  The  multiplicity  of  an  eigenvalue  \i  is  sometimes  referred  to  as  its  algebraic 
multiplicity.  The  dimension  of  the  eigenspace  Vx  is  its  geometric  multiplicity ,  and  so 
completeness  requires  that  the  two  multiplicities  be  equal.  The  word  “complete”  is  not 
standard,  but  has  been  chosen  because  it  can  be  used  to  describe  both  matrices  and  their 
individual  eigenvalues.  Alternative  terms  used  to  describe  complete  matrices  include  per¬ 
fect ,  semi-simple ,  and,  as  we  discuss  shortly,  diagonalizable. 


Theorem  8.23.  An  nxn  real  or  complex  matrix  A  is  complete  if  and  only  if  its  eigenvec¬ 
tors  span  Cn.  In  particular,  an  n  x  n  matrix  that  has  n  distinct  eigenvalues  is  complete. 


Or,  stated  another  way,  a  matrix  is  complete  if  and  only  if  its  eigenvectors  form  a  basis 
of  Cn.  Most  matrices  are  complete.  Incomplete  n  x  n  matrices,  which  have  fewer  than  n 
linearly  independent  complex  eigenvectors,  are  less  pleasant  to  deal  with,  and  we  relegate 
most  of  the  messy  details  to  Section  8.6. 


Remark.  We  already  noted  that  complex  eigenvectors  of  a  real  matrix  always  appear 
in  conjugate  pairs:  v  =  x  ±  iy.  If  the  matrix  is  complete,  then  it  can  be  shown  that 
its  real  eigenvectors  combined  with  the  real  and  imaginary  parts  of  its  complex  conjugate 
eigenvectors  form  a  real  basis  for  Mn.  (See  Exercise  8.3.12  for  the  underlying  principle.) 
For  instance,  the  complex  eigenvectors  of  the  3x3  matrix  appearing  in  Example  8.9  are 


The  vectors 


consisting  of  its  real  eigenvector 


and  the  real  and  imaginary  parts  of  its  complex  eigenvectors,  form  a  basis  for  M3. 


Exercises 


8.3.1.  Which  of  the  following  are  complete  eigenvalues  for  the  indicated  matrix?  What  is  the 
dimension  of  the  associated  eigenspace?  (a)  3,  (  ^  ^  )  5  (b)  2,  2  1 


0  2 


This  follows  from  the  Jordan  Canonical  Form  Theorem  8.51. 
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(0  0 

-i\ 

(c)  r 

1  1 

0 

5 

Vi  o 

(V 

( -i 

l 

0\ 

(f)  -i, 

—  i 

l 

-1 

V  o 

0 

-i  / 

(1 

1 

(d)  1, 

1 

1  o 

VI 

i  2y 

/  1  0 

(g)  - 

2, 

0  1 

-1  1 

V  1  0 


(e) 


1  X\ 
0  1 

1  0 

0  1 ) 


( h ) 


1  -4 

0  -1 
0  4 

/I  -1 
0  1 
0  0 
0  0 
Vo  0 


8.3.2.  Find  the  eigenvalues  and  a  basis  for  the  each  of  the  eigenspaces  of  the  following  matrices. 
Which  are  complete? 

3  -2 

4  -1 


(a) 


4 

1 


4 

0 


(b) 


6  -8 
4  -6 


(c) 


(d) 


i 

1 


1 

i 


/  4 


(e) 


0 

Vi 


i 

3 

1 


1\ 

0 

2/ 


(0 


/  —6 
-4 

V  4 


0 

2 

0 


— 8\ 
-4 

oy 


(g) 


( 


\ 


2  1 
5  -3 
5  -1 


1\ 
6 
4y 


(h) 


/ 

l 

0 

0 

°\ 

(-1 

0 

1 

2  \ 

0 

1 

0 

0 

0 

1 

0 

1 

-1 

1 

-1 

0 

5  W 

-1 

-4 

1 

-2 

V 

1 

0 

-1 

o) 

^  0 

1 

0 

8.3.3.  Which  of  the  following  matrices  admit  eigenvector  bases  of  R  ?  For  those  that  do, 
exhibit  such  a  basis.  If  not,  what  is  the  dimension  of  the  subspace  of  IRn  spanned  by  the 

/I  -2  0\ 


eigenvectors?  (a) 


1  3 
3  1 


(*>) 


1 

-3 


3 

1 


(c) 


1 

0 


3 

1 


(d) 


/ 1  -2  0\ 

(2  0  0\ 

fo  0  -1\ 

(e) 

0-1  0  ,  (f) 

1  -1  1 

-  (g) 

0  1  0 

.  (A) 

V0  -4  -l) 

U  i  -i) 

1  o  oy 

0  -1 
\4  -4 

/0 
0 
1 

VI 


0 

-iy 
0  -1 
1  0 
0  -1 
0  1 


1\ 
1 
0 
oy 


8.3.4.  Answer  Exercise  8.3.3  with  Rn  replaced  by  C 


n 


8.3.5.  (a)  Give  an  example  of  a  3  x  3  matrix  with  1  as  its  only  eigenvalue,  and  only  one  linearly 
independent  eigenvector,  (b)  Find  one  that  has  two  linearly  independent  eigenvectors. 

8.3.6.  True  or  false:  (a)  Every  diagonal  matrix  is  complete. 

(b)  Every  upper  triangular  matrix  is  complete. 

8.3.7.  Prove  that  if  A  is  a  complete  matrix,  then  so  is  c  A  d  I ,  where  c,  d  are  any  scalars. 

r\ 

8.3.8.  (a)  Prove  that  if  A  is  complete,  then  so  is  A  . 

r\ 

(b)  Give  an  example  of  an  incomplete  matrix  A  such  that  A  is  complete. 


8.3.9.  Let  U  be  an  upper  triangular  matrix  with  all  its  diagonal  entries  equal.  Prove  that 
U  is  complete  if  and  only  if  U  is  a  diagonal  matrix. 

0  8.3.10.  Suppose  v1? . . . ,  v  is  an  eigenvector  basis  for  the  complete  matrix  A,  with  A1? . . . ,  A 
the  corresponding  eigenvalues.  Prove  that  every  eigenvalue  of  A  is  one  of  the  A1} . . . ,  A  . 

0  8.3.11.  Show  that  if  A  is  complete,  then  every  similar  matrix  B  =  S~1AS  is  also  complete. 

0  8.3.12.  (a)  Prove  that  if  x  =L  iy  is  a  complex  conjugate  pair  of  eigenvectors  of  a  real  matrix 
A  corresponding  to  complex  conjugate  eigenvalues  /a  ±  i  v  with  v  ^  0,  then  x  and  y  are 
linearly  independent  real  vectors,  (b)  More  generally,  if  v  ■  =  x  -  =t  i  y  - ,  j  =  1  ,...,&, 
are  complex  conjugate  pairs  of  eigenvectors  corresponding  to  distinct  pairs  of  complex 
conjugate  eigenvalues  fij  d=  i  v- ,  i/  .  /  0,  then  the  real  vectors  x1? . . . ,  xfc,  y1? . . . ,  yk  are 
linearly  independent,  (c)  Prove  that  if  A  is  complete,  then  there  exists  a  basis  of  IRn 
consisting  of  its  real  eigenvectors  and  real  and  imaginary  parts  of  its  complex  eigenvectors. 
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Diagonalization 


Let  L :  Mn  — »>  Mn  be  a  linear  transformation  on  n-dimensional  Euclidean  space.  As  we  know, 
cf.  Theorem  7.5,  L[x]  =  A  x  is  prescribed  by  multiplication  by  an  n  x  n  matrix  A.  However, 
the  matrix  representing  a  given  linear  transformation  will  depend  on  the  choice  of  basis 
for  the  underlying  vector  space  Mn.  Linear  transformations  having  a  complicated  matrix 
representation  in  terms  of  the  standard  basis  e1? . . . ,  en  may  be  considerably  simplified  by 
choosing  a  suitably  adapted  basis  v1? . . . ,  vn.  We  are  now  in  a  position  to  understand  how 
to  effect  such  a  simplification. 

For  example,  the  linear  transformation  L 
is  represented  by  the  matrix  A  — 


x 

y 


x-y 
2x  +  4  y 


studied  in  Example  7.19 


when  expressed  in  terms  of  the  standard 
l\  (  1  . 

,  it  is  represented 


-1 


v2  = 


-2 


basis  of  RA  In  terms  of  the  alternative  basis  v:  = 

by  the  diagonal  matrix  ^  ^  ^  ^ ,  implying  that  it  has  a  simple  stretching  action  on  the 

new  basis  vectors:  dv1  =  2v1,  Av2  =  3v2.  Now  we  can  understand  the  reason  for 
this  simplification.  The  new  basis  consists  of  the  two  eigenvectors  of  the  matrix  A.  This 
observation  is  indicative  of  a  general  fact:  representing  a  linear  transformation  in  terms 
of  an  eigenvector  basis  has  the  effect  of  changing  its  matrix  representative  into  a  simple 
diagonal  form  —  thereby  diagonalizing  the  original  coefficient  matrix. 

According  to  (7.31),  if  v1? . . . ,  vn  form  a  basis  of  Mn,  then  the  corresponding  matrix 
representative  of  the  linear  transformation  L[v]  =  Av  is  given  by  the  similar  matrix 
B  —  S~1AS1  where  S  —  (v1,v2,.. 
vectors.  In  the  preceding  example, 


vn )  is  the  matrix  whose  columns  are  the  basis 


S  = 


1 

1 


1 


,  and  hence  S  1  AS  = 


2 

1 


1 

1 


Definition  8.24.  A  square  matrix  A  is  called  diagonalizable  if  there  exists  a  nonsingular 
matrix  S  and  a  diagonal  matrix  A  =  diag  (A1? . . . ,  An)  such  that 


S~1AS  =  A, 


or,  equivalently, 


A  =  S  AS^1 


(8.30) 


A  diagonal  matrix  represents  a  linear  transformation  that  simultaneously  stretches^ 
in  the  direction  of  the  basis  vectors.  Thus,  every  diagonalizable  matrix  represents  an 
elementary  combination  of  (complex)  stretching  transformations. 

To  understand  the  diagonalization  equation  (8.30),  we  rewrite  it  in  the  equivalent  form 

AS  =  S  A.  (8.31) 

Using  the  columnwise  action  (1.11)  of  matrix  multiplication,  one  easily  sees  that  the  kth 
column  of  this  n  x  n  matrix  equation  is  given  by 

Avk  = 

where  vfc  denotes  the  kth  column  of  S.  Therefore,  the  columns  of  S  are  necessarily  eigen¬ 
vectors,  and  the  entries  of  the  diagonal  matrix  A  are  the  corresponding  eigenvalues.  And, 


t  A  negative  diagonal  entry  represents  the  combination  of  a  reflection  and  stretch.  Complex 
entries  indicate  complex  “stretching”  transformations.  See  Section  7.2  for  details. 
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as  a  result,  a  diagonalizable  matrix  A  must  have  n  linearly  independent  eigenvectors,  i.e., 
an  eigenvector  basis,  to  form  the  columns  of  the  nonsingular  diagonalizing  matrix  S.  Since 
the  diagonal  form  A  contains  the  eigenvalues  along  its  diagonal,  it  is  uniquely  determined 
up  to  a  permutation  of  its  entries. 

Now,  as  we  know,  not  every  matrix  has  an  eigenvector  basis.  Moreover,  even  when  it 
exists,  the  eigenvector  basis  may  be  complex,  in  which  case  S  is  a  complex  matrix,  and  the 
entries  of  the  diagonal  matrix  A  are  the  complex  eigenvalues.  Thus,  we  should  distinguish 
between  complete  matrices  that  are  diagonalizable  over  the  complex  numbers  and  the  more 
restrictive  class  of  real  matrices  that  can  be  diagonalized  by  a  real  matrix  S. 


Theorem  8.25.  A  matrix  is  complex  diagonalizable  if  and  only  if  it  is  complete.  A  real 
matrix  is  real  diagonalizable  if  and  only  if  it  is  complete  and  has  all  real  eigenvalues. 


Example  8.26.  The  3x3  matrix  A  —  I  1 
eigenvector  basis  V  1 


1 

2 

1 


vi  = 


Vo  = 


We  assemble  these  to  form  the  eigenvector  matrix 


S  = 


1 

1 

0 


1 

0 

1 


whereby 


The  diagonalization  equation  (8.30)  becomes 


S~l  A  S  — 


1 

1  |  considered  in  Example  8.5  has 

2 


Vo  = 


1 

1 

1 


s -1  = 


1 

1 

1 


0 

1 

1 


-1 

1 

0 


=  A. 


with  eigenvalues  of  A  appearing  on  the  diagonal  of  A,  in  the  same  order  as  the  eigenvectors. 


Remark.  If  a  matrix  is  not  complete,  then  it  cannot  be  diagonalized.  A  simple  example  is 

a  matrix  of  the  form  A  =  ^  J  ^  with  c  ^  0,  which  represents  a  shear  in  the  direction  of 

the  x-axis.  Incomplete  matrices  represent  generalized  shearing  transformations,  and  will 
be  the  subject  of  Section  8.6. 


Exercises 


8.3.13.  Diagonalize  the  following  matrices:  (a) 

(  8 


( d ) 


(h) 


/  —2  3 

0  1 

V  0  0 


(e) 


V 


o 

3  0 
3  0 


(0 


/ 


3 

5 


( b ) 

3 

6 


\  —5  -8  -7 


/ 1 

0 

-1 

1\ 

(0 

0 

1 

o\ 

/ 

2 

1 

-1 

o\ 

0 

2 

-1 

1 

0 

0 

0 

1 

,  O') 

-3 

-2 

0 

1 

0 

0 

-1 

0 

>  w 

1 

0 

0 

0 

0 

0 

1 

-2 

\o 

0 

0 

-2  ) 

\0 

1 

0 

o) 

V 

0 

0 

1 

-1/ 

4 

5 

5 

2 

5 
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8.3.14.  Diagonalize  the  Fibonacci  matrix  F  = 


1  1 
1  0 


8.3.15.  Diagonalize  the  matrix 


0  -1 

1  0 


of  rotation  through  90°.  How  would  you  interpret 


the  result? 

8.3.16.  Diagonalize  the  rotation  matrices  (a) 


(0 

-l 

0\ 

/  Jl 

13 

0 

12  \ 
13 

1 

0 

0 

,  (b) 

0 

1 

0 

Vo 

0 

1/ 

V  13 

0 

13  / 

8.3.17.  Which  of  these  matrices  have  real  diagonal  forms?  (a) 


-2  1 
4  1 


>  (b) 


1 

-3 


2 

1 


/  0  1  0\ 


(c) 


10  1 

110/ 


/  0 


( d ) 


/  3  -8  2\ 


3  2 

1  1-11,  (e)  -1  2  2 

V  1  1  0/  V  1  —3  —1  /  \  1  -4  2  J 

8.3.18.  Diagonalize  the  following  complex  matrices: 

1  —  i  0  \  ,  v  (2  -  i  2+i 

i  2+  i  ’  [c'  3-i  1+  i 


.  (f) 


(a) 


i  1 
1  i 


(b) 


1 
0 
0 

V-l 


(d) 


8.3.19.  Write  down  a  real  matrix  that  has 

(a)  eigenvalues  —1,3  and  corresponding  eigenvectors 

(b)  eigenvalues  0,  2,  —2  and  associated  eigenvectors 


(c)  an  eigenvalue  of  3  and  corresponding  eigenvectors 


(d)  an  eigenvalue  —  1  +  2i  and  corresponding  eigenvector 


1 

2 


1 

1 


(e)  an  eigenvalue  —2  and  corresponding  eigenvector 


/  2  \ 
0 

V-i  / 
/ 


(f)  an  eigenvalue  3  +  i  and  corresponding  eigenvector 


2  i 

V-l-  i 


8.3.20.  A  matrix  A  has  eigenvalues  —1  and  2  and  associated  eigenvectors 


0 

1 

0 

1 

-  i 

-  i 
1 


0 

0 

1 

1 


0\ 
0 
0 
1  / 


0 
1 

0  -i  ) 


1\ 

1 


and 


2 

3 


Write  down  the  matrix  form  of  the  linear  transformation  L[ u]  =  Aw  in  terms  of  (a)  the 
standard  basis  e1?e2;  (b)  the  basis  consisting  of  its  eigenvectors;  (c)  the  basis  (  ^  J,  (  ^ 


0  8.3.21.  Prove  that  two  complete  matrices  A,  B  have  the  same  eigenvalues  (with  multiplicities) 
if  and  only  if  they  are  similar,  i.e.,  B  =  S~l  AS  for  some  nonsingular  matrix  S. 


8.3.22.  Let  B  be  obtained  from  A  by  permuting  both  its  rows  and  columns  using  the  same 
permutation  7 r,  so  b-  =  an^  ^(j)-  Prove  that  A  and  B  have  the  same  eigenvalues.  How  are 

their  eigenvectors  related? 

8.3.23.  True  or  false:  If  A  is  a  complete  upper  triangular  matrix,  then  it  has  an  upper 
triangular  eigenvector  matrix  S. 

8.3.24.  How  many  different  diagonal  forms  does  an  n  x  n  diagonalizable  matrix  have? 

8.3.25.  Characterize  all  complete  matrices  that  are  their  own  inverses:  A-1  =  A.  Write  down  a 
non-diagonal  example. 


8.4  Invariant  Subspaces 
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T  8.3.26.  Two  n  x  n  matrices  A,  B  are  said  to  be  simultaneously  diagonalizable  if  there  is  a 
nonsingular  matrix  S  such  that  both  S~1AS  and  S~1BS  are  diagonal  matrices. 

(a)  Show  that  simultaneously  diagonalizable  matrices  commute:  AB  =  B  A. 

(b)  Prove  that  the  converse  is  valid,  provided  that  one  of  the  matrices  has  no  multiple 
eigenvalues,  (c)  Is  every  pair  of  commuting  matrices  simultaneously  diagonalizable? 


8.4  Invariant  Subspaces 

The  notion  of  an  invariant  subspace  of  a  linear  map  plays  an  important  role  in  dynamical 
systems,  both  finite-  and  infinite-dimensional,  as  well  as  in  linear  iterative  systems,  and  in 
linear  control  systems.  With  the  theory  of  eigenvalues  and  eigenvectors  in  hand,  we  are 
now  able  to  completely  characterize  them. 


Definition  8.27.  Let  L:  V  -T  V  be  a  linear  transformation  on  a  vector  space  V.  A 
subspace  W  C  V  is  said  to  be  invariant  if  L[w]  E  W  whenever  w  E  W. 

Trivial  examples  of  invariant  subspaces,  valid  for  any  linear  map,  are  the  entire  space 
W  —  V  and  the  zero  subspace  W  —  {0}.  If  L  —  I  is  the  identity  transformation,  then 
every  subspace  W  C  V  is  invariant.  More  interestingly,  both  the  kernel  and  image  of  L  are 
invariant  subspaces.  Indeed,  when  w  E  kerL,  then  L[w]  =  0  E  kerL,  proving  invariance. 
Similarly,  if  w  E  imgL,  then  L[w]  also  lies  in  imgL  by  the  definition  of  image. 


Example  8.28.  Let  V  —  M2.  Let  us  find  all  invariant  subspaces  of  the  scaling  trans- 

T  T 

formation  L(x,y)  =  (2x,3 y)  .  If  W  is  the  line  spanned  by  a  vector  w  =  ( a,  b )  7^  0 

then  L[w]  -  (2a,  36)  E  W  if  and  only  if  (  2a,  36 )  =  cw  =  (  ca,  c6 )  for  some  scalar  c. 

This  is  clearly  possible  if  and  only  if  either  a  =  0  or  6  =  0.  Thus,  the  only  one-dimensional 
invariant  subspaces  of  this  scaling  transformation  are  the  x-  and  y- axes. 

Next,  consider  the  linear  transformation  L(x,y)  =  (x  3y,y)  corresponding  to  a 
shear  in  the  direction  of  the  x-axis.  By  the  same  reasoning,  the  one-dimensional  subspace 
spanned  by  w  =  ( a,  6  )T  7^  0  is  invariant  if  and  only  if  ( a  +  36,  6  )T  =  ( ca,  c6  )T  for  c  E  R, 
which  is  possible  only  if  6  =  0.  Thus,  the  only  one-dimensional  invariant  subspace  of  this 
shearing  transformation  is  the  x-axis  itself. 

Finally,  consider  the  linear  transformation  L(x,y)  =  (—y,x)  corresponding  to  coun¬ 
terclockwise  rotation  by  90°.  It  is  easy  to  see,  either  geometrically  or  algebraically,  that 
L  has  no  nontrivial  invariant  subspaces.  On  the  other  hand,  if  we  view  L  as  a  map  on 
C2,  then  one  can  show  that  there  are  two  one-dimensional  complex  invariant  subspaces, 
namely  those  spanned  by  its  eigenvectors  w:  =  ( 1,  i  )T  and  w2  =  ( 1,  —  i  )T. 


From  here  on,  we  restrict  our  attention  to  the  finite-dimensional  case,  and  so  the  linear 


transformation  L  on  either  Mn  or  Cn  is  given  by  matrix  multiplication:  L[x 
some  n  x  n  matrix  A. 


=  Ax.  for 


Proposition  8.29.  A  one-dimensional  subspace  is  invariant  under  the  linear  transforma¬ 
tion  L[x]  =  Ax  if  and  only  if  it  is  an  eigenline  spanned  by  an  eigenvector  of  A. 


Proof :  Let  W  be  spanned  by  the  single  non-zero  vector  w  ^  0.  Then  Aw  E  W  if  and 
only  if  Aw  =  Aw  for  some  scalar  A.  But  this  means  that  w  is  an  eigenvector  of  A  with 
eigenvalue  A.  Q.E.D. 
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Thus,  if  A  has  no  multiple  eigenvalues,  it  has  a  finite  number  of  one- dimensional  invari¬ 
ant  subspaces,  namely  the  eigenspaces  (eigenlines)  associated  with  each  eigenvalue.  On 
the  other  hand,  if  A  is  a  multiple  eigenvalue,  then  every  one-dimensional  subspace  of  its 
eigenspace  W  cVA  =  {v|Alv  =  Av}is  invariant. 

We  already  observed  that  if  A  =  A  I,  then  every  subspace  V  C  Cn  is  invariant.  On 
the  other  hand,  if  A  is  diagonal,  with  all  distinct  entries,  then  the  invariant  subspaces  are 
necessarily  spanned  by  a  finite  collection  of  standard  basis  vectors  ,  ...  ,  ez  ,  which  are 
recognized  as  its  eigenvectors.  This  is  a  special  case  of  the  following  general  characterization 
of  invariant  subspaces  of  complete  matrices.  The  incomplete  case  will  be  dealt  with  in 
Section  8.6. 


Theorem  8.30.  If  A  is  a  complete  matrix,  then  every  ^-dimensional  complex  invariant 
subspace  is  spanned  by  k  linearly  independent  eigenvectors  of  A. 


Proof :  Let  W  {0}  be  a  nontrivial  invariant  subspace.  Thanks  to  completeness,  we  can 
express  every  nonzero  vector  O^wGlb  as  a  linear  combination,  w  =  c1v1  +  •  •  •  +  c  -  v  ■, 
where  v1? . . . ,  v  •  are  eigenvectors  associated  with  distinct  eigenvalues  A1? . . . ,  A  •  of  A  and 
all  coefficients  c-  ^  0.  We  claim  that  this  implies  that  each  represented  eigenvector  wi  E  W 
for  i  —  1, . . . ,  j,  which  clearly  establishes  the  result.  To  prove  the  claim,  we  write 


Aw  —  A^w  =  c1(A1  —  AJ)v1  T  •••  V  ck_1(Xj_1  —  AJ)vJ_1  E  W, 

since,  by  the  assumption  of  invariance,  both  terms  on  the  left  hand  side  belong  to  W. 
Moreover,  since  the  eigenvalues  are  distinct,  we  must  have  ci(Xi  —  A  ■)  7^  0  for  i  —  1, . . . ,  j  —  1. 
Iterating  this  process,  we  eventually  conclude  that  a  nonzero  multiple  of  v:  and  hence  v: 
itself  belongs  to  W.  This  result  is  independent  of  the  ordering  of  the  eigenvectors,  and 
hence  all  v1? . . . ,  v  •  E  W.  Q.E.D. 


If  A  is  a  complete  real  matrix  that  possesses  all  real  eigenvalues,  then  the  same  proof 
shows  that  every  real  invariant  subspace  has  the  form  given  in  Theorem  8.30.  If  A  is  real 
and  complete,  with  complex  conjugate  eigenvalues,  Theorem  8.30  describes  its  complex 
invariant  subspaces.  Its  real  invariant  subspaces  are  obtained  from  the  real  and  imagi¬ 
nary  parts  of  the  eigenvectors.  For  example,  if  v±  =  x  d=  iy  are  a  complex  conjugate 
pair  of  eigenvectors,  then  they  individually  span  one-dimensional  complex  invariant  sub¬ 
spaces.  However,  the  smallest  corresponding  real  invariant  subspace  is  the  two-dimensional 
subspace  spanned  by  x  and  y. 


Example  8.31.  Consider  the  three-dimensional  rotation  (permutation)  matrix 


A  = 


1 

0 

0 


It  has  one  real  eigenvalue,  X1  —  1,  and  two  complex  conjugate  eigenval¬ 


ues,  A2  =  7  T  Aw  i  and  Ao  =  \  i .  The  complex  invariant  subspaces  are  spanned  by 


T 


Vs  _  vi>  n 

2  ’  9  5  u 


Vs 

2 


T 


0,  1,  2,  or  3  of  the  corresponding  complex  eigenvectors  (—  —  \AY  ±  1 

There  is  a  single  one-dimensional  real  invariant  subspace,  spanned  by  the  real  eigenvector 
(1,1,1)  ,  and  a  single  two-dimensional  real  invariant  subspace,  which  is  the  orthogonal 
complement  spanned  by  the  real  and  imaginary  parts  of  the  complex  conjugate  eigenvec¬ 
tors.  Indeed,  this  indicates  a  general  property  satisfied  by  all  3  x  3  rotation  matrices,  with 
the  exception  of  the  trivial  identity  matrix.  The  unique  one-dimensional  real  invariant 
subspace  is  the  axis  of  the  rotation,  and  the  matrix  reduces  to  a  two-dimensional  rotation 
on  its  orthogonal  complement.  See  Exercise  8.2.44  for  further  details. 
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Exercises 


8.4.1.  Find  all  invariant  subspaces  W  C  M2  of  the  following  linear  transformations  L: 


T  T 

(a)  the  scaling  transformation  ( 2  x,  3  y,  Az )  ;  (b)  the  shear  (x  + 3  y,y,z) 

(c)  counterclockwise  rotation  by  a  45°  angle  around  the  x-axis. 

1  2 
2  1 


8.4.2.  Find  all  invariant  subspaces  of  the  following  matrices:  (a) 


( b ) 


(0 

0 

2\ 

(c) 

0 

1 

0 

» (d) 

V2 

0 

(V 

V 

6 

4 

4 


0 

2 

0 


■8  \ 
4 

6  / 


(e) 


8.4.3.  Find  all  complex  invariant  subspaces  and  all  real  invariant  subspaces  of 

/  3  3  5\  2 


(a) 


0 

1 


1 

0 


(*>) 


-1 

1 


2 

1 


(c) 


5  6  5 

V-5  -8  -7 


(d) 


0 


3 

-2 


(0 

0 

1 

°\ 

f  1 

0 

1 

°\ 

0 

0 

0 

1 

,  (f) 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

o) 

\0 

1 

0 

ij 

5 

2 


0 

3 


\0  -5 


5  \ 
0 

-3/ 


8.4.4.  Prove  that  if  VF  is  an  invariant  subspace  for  A,  then  it  is  also  invariant  for  A2.  Is  the 
converse  to  this  statement  valid? 


0  8.4.5.  Let  V  C  Mn  be  an  invariant  subspace  for  the  n  x  n  matrix  A.  Explain  why  every 
eigenvalue  and  eigenvector  of  the  linear  map  obtained  by  restricting  A  to  V  are  also 
eigenvalues  and  eigenvectors  of  A  itself. 

8.4.6.  True  or  false:  If  V  and  W  are  invariant  subspaces  for  the  matrix  A,  then  so  is 

(a)  F  +  IT;  (b)  V  n  VF;  (c)  V  U  VF;  (d)  V\W. 


8.4.7.  True  or  false:  If  V  is  an  invariant  subspace  for  the  nxn  matrix  A  and  W  is  an  invariant 
subspace  for  the  nxn  matrix  b>,  then  V  AW  is  an  invariant  subspace  for  the  matrix  A  A  B. 


8.4.8.  True  or  false:  If  W  is  an  invariant  subspace  of  the  matrix  A,  then  it  is  also  an  invariant 

'T' 

subspace  of  A  . 

8.4.9.  True  or  false:  If  W  is  an  invariant  subspace  of  the  nonsingular  matrix  A ,  then  it  is  also 
an  invariant  subspace  of  T_1. 

8.4.10.  Which  2  x  2  orthogonal  matrices  have  a  nontrivial  real  invariant  subspace? 

8.4.11.  True  or  false:  If  Q  ^  Al  is  a  4  x  4  orthogonal  matrix,  then  Q  has  no  real  invariant 
subspaces. 

0  8.4.12.  (a)  Let  A  be  an  n  x  n  symmetric  matrix,  and  let  v  be  an  eigenvector.  Prove  that  its 
orthogonal  complement  under  the  dot  product,  namely,  V =  {  w  G  Mn  |  •  w  =  0},  is 

an  invariant  subspace,  (b)  More  generally,  prove  that  if  W  C  Mn  is  an  invariant  subspace, 
then  its  orthogonal  complement  is  also  invariant. 


8.5  Eigenvalues  of  Symmetric  Matrices 

Fortunately,  the  matrices  that  arise  in  most  applications  are  complete  and,  in  fact,  possess 
some  additional  structure  that  ameliorates  the  calculation  of  their  eigenvalues  and  eigen¬ 
vectors.  The  most  important  class  is  that  of  the  symmetric,  including  positive  definite, 
matrices.  In  fact,  not  only  are  the  eigenvalues  of  a  symmetric  matrix  necessarily  real,  the 
eigenvectors  always  form  an  orthogonal  basis  of  the  underlying  Euclidean  space,  enjoying 
all  the  wonderful  properties  we  studied  in  Chapter  4.  In  fact,  this  is  by  far  the  most  com¬ 
mon  way  for  orthogonal  bases  to  appear  —  as  the  eigenvector  bases  of  symmetric  matrices. 
Let  us  state  this  important  result,  but  defer  its  proof  until  the  end  of  the  section. 
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Theorem  8.32.  Let  A  =  AT  be  a  real  symmetric  n  x  n  matrix.  Then 

(a)  All  the  eigenvalues  of  A  are  real. 

(b)  Eigenvectors  corresponding  to  distinct  eigenvalues  are  orthogonal. 

(c)  There  is  an  orthonormal  basis  of  Mn  consisting  of  n  eigenvectors  of  A. 
In  particular,  all  real  symmetric  matrices  are  complete  and  real  diagonalizable. 


Remark.  Orthogonality  is  with  respect  to  the  standard  dot  product  on  Mn.  As  we  noted 
in  Section  7.5,  the  transpose  is  a  particular  case  of  the  adjoint  operation  when  we  use 
the  Euclidean  dot  product.  An  analogous  result  holds  for  more  general  self-adjoint  linear 
transformations  under  more  general  inner  products  on  Mn;  see  Exercise  8.5.10. 


Example  8.33.  The  2x2  matrix  A  = 


considered  in  Example  8.5  is  symmetric, 


and  so  has  real  eigenvalues  A1  =  4  and  A2  =  2.  You  can  easily  check  that  the  corresponding 

eigenvectors  v:  =  (1,1)  and  v2  =  ( —  1, 1  )T  are  orthogonal:  v1  •  v2  =  0,  and  hence  form 
an  orthogonal  basis  of  M2.  The  orthonormal  eigenvector  basis  promised  by  Theorem  8.32 
is  obtained  by  dividing  each  eigenvector  by  its  Euclidean  norm: 


u 


U2  = 


75/ 


5  -4 


Example  8.34.  Consider  the  symmetric  matrix  A  =  —4  5 

V  ~  " 

ward  computation  produces  its  eigenvalues  and  eigenvectors: 


2  ) .  A  straightfor- 
-1 


A,  =9, 


v 


l 


As  the  reader  can  check,  the  eigenvectors  form  an  orthogonal  basis  of  M3.  An  orthonormal 
basis  is  provided  by  the  corresponding  unit  eigenvectors 


( 

Y 

( 

vY 

Ui  = 

1 

\/2 

>  u2  = 

i 

»  U3  = 

i 

vT 

\ 

0  ) 

\  73  / 

\ 

The  eigenvalues  of  a  symmetric  matrix  can  be  used  to  test  its  positive  definiteness. 


Theorem  8.35.  A  symmetric  matrix  K  —  KT  is  positive  definite  if  and  only  if  all  of  its 
eigenvalues  are  strictly  positive. 


Proof :  First,  if  K  >  0,  then,  by  definition,  x.TKx.  >  0  for  all  nonzero  vectors  x  E  Mn.  In 
particular,  if  x  =  v  ^  0  is  an  eigenvector  with  (necessarily  real)  eigenvalue  A,  then 


0  <  vtATv  =  vT(A v)  =  A vTv  =  A 


(8.32) 


which  immediately  proves  that  A  >  0. 
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Conversely,  suppose  K  has  all  positive  eigenvalues.  Let  u1? . . . ,  un  be  the  orthonormal 
eigenvector  basis  guaranteed  by  Theorem  8.32,  with  Ku  ■  —  AJ  with  A  -  >  0.  Writing 

x  =  c1  ux  +  •••  +cnun,  we  obtain  iEx  =  c1A1u1  +  •••  -fcnAnun. 

Therefore,  using  the  orthonormality  of  the  eigenvectors, 

xTKx=(Cl  uj  +  ■■■  +cnul){c1X1u1+  ■■■  +  cn  \n  un)  =  Xlc\+  ■■■  +Xnc2n>0 

whenever  x^O,  since  only  x  =  0  has  coordinates  cx  —  •  •  •  =  cn  —  0.  This  establishes  the 
positive  dehniteness  of  K.  Q.E.D. 


Remark.  The  same  proof  shows  that  K  is  positive  semi-definite  if  and  only  if  all  its 
eigenvalues  satisfy  A  >  0.  A  positive  semi-definite  matrix  that  is  not  positive  definite 
admits  a  zero  eigenvalue  and  one  or  more  null  eigenvectors ,  i.e.,  solutions  to  ifv  =  0. 
Every  nonzero  element  0^  vG  ker  K  of  its  kernel  is  a  null  eigenvector. 


Example  8.36.  The  symmetric  matrix  K 


8  0  l\ 

0  8  1  has  characteristic  equation 

1  1  7/ 


det(K-XI)  =  -  A3  +  23  A2  -  174A  +  432  =  -  (A  -  9)(A  -  8)(A  -  6), 


and  so  its  eigenvalues  are  9,8,  and  6.  Since  they  are  all  positive,  if  is  a  positive  definite 
matrix.  The  associated  eigenvectors  are 


Note  that  the  eigenvectors  form  an  orthogonal  basis  of  M3,  as  guaranteed  by  Theorem  8.32, 
As  usual,  we  can  construct  an  corresponding  orthonormal  eigenvector  basis 


u, 


+3^ 

1 

+3 


u2  = 


v^/  v 

by  dividing  each  eigenvector  by  its  norm 


tC 

i 

V2 

0 


Un  — 


(  ~ 


+6  ^ 

1 

a/6 

75/ 


Proof  of  Theorem  8.32 :  First  recall  that  (see  Exercise  3.6.38)  if  A  =  AT  is  real,  symmetric, 
then 

(4v)  •  w  =  v  •  (4w)  for  all  v,wECn,  (8.33) 

where  •  indicates  the  Euclidean  dot  product  when  the  vectors  are  real  and,  more  generally, 
the  Hermitian  dot  product  v  •  w  =  vTw  when  they  are  complex. 

To  prove  property  (a),  suppose  A  is  a  complex  eigenvalue  with  complex  eigenvector 
v  E  Cn.  Consider  the  Hermitian  dot  product  of  the  complex  vectors  Aw  and  v: 


(lv) 


•  v 


On  the  other  hand,  by  (8.33), 


(Aw)  •  v  =  v  •  (Aw)  =  v  •  (A v)  =  vT  A v  =  A  ||  v 
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Equating  these  two  expressions,  we  deduce 


A 


=  A 


Since  v  is  an  eigenvector,  it  must  be  nonzero.  Thus,  we  deduce  that  A  =  A,  proving  that 
the  eigenvalue  A  must  be  real. 

To  prove  (b),  suppose  4v  =  Av,  4w  =  /iw,  where  A  ^  fi  are  distinct  real  eigenvalues. 
Then,  again  by  (8.33), 

A  v  •  w  =  (A v)  •  w  =  v  •  (A w)  =  v  •  (/iw)  =  //  v  •  w,  and  hence  (A  —  fi)  v  •  w  =  0. 

Since  A  /g,  this  implies  that  v  •  w  =  0,  so  the  eigenvectors  v,  w  are  orthogonal. 

Finally,  the  proof  of  (c)  is  easy  if  all  the  eigenvalues  of  A  are  distinct.  Theorem  8.21 
implies  that  the  eigenvectors  form  a  basis  of  Mn,  and  part  (b)  proves  they  are  orthogonal. 
(An  alternative  proof  starts  with  orthogonality,  and  then  applies  Proposition  4.4  to  prove 
that  the  eigenvectors  form  a  basis.)  To  obtain  an  orthonormal  basis,  we  merely  divide  the 
eigenvectors  by  their  lengths:  ufc  =  vfc/||  vk  ||,  as  in  Lemma  4.2. 

To  prove  (c)  in  general,  we  proceed  by  induction  on  the  size  n  of  the  matrix  A.  To  start, 
the  case  of  a  1  x  1  matrix  is  trivial.  (Why?)  Next,  suppose  A  has  size  n  x  n.  We  know 
that  A  has  at  least  one  eigenvalue,  A1?  which  is  necessarily  real.  Let  v1  be  an  associated 
eigenvector.  Let  V1-  =  {  w  G  Mn  |  v:  •  w  =  0  }  denote  its  orthogonal  complement  —  the 
subspace  of  all  vectors  orthogonal  to  the  first  eigenvector.  Proposition  4.41  implies  that 
dim  V1-  —  n—  1,  and  so  we  can  choose  an  orthonormal  basis  y1? . . . ,  yn_1 .  Moreover,  by  Ex¬ 
ercise  8.4.12,  V1-  is  an  invariant  subspace  of  A,  and  hence  A  defines  a  linear  transformation 


on  V±  that  is  represented  by  an  (n-l)x(n—  1)  matrix,  say  B  =  (bij)1  z,  j  =  1, . . .  ,n  —  1, 

with  respect  to  the  chosen  orthonormal  basis  y1? . . . ,  y 
Moreover,  by  orthonormality  and  (8.33), 


n  —  1  ’ 


SO 


\  ij  /  i  '  J  ,  ,  ? 

that  Ayt  =  £"=1  ft^y,-. 


bij  -  y*  '  Ay  -  -  (Ayt)  •  y,.  =  bH, 


3 


3 


3‘ 


and  hence  B  =  BT  is  symmetric.  Our  induction  hypothesis  then  implies  that  there  is  an 
orthonormal  basis  of  V1-  C  Mn  consisting  of  eigenvectors  u2, . . . ,  un  of  L>,  and  hence  also 
of  A,  each  of  which  is  orthogonal  to  vx.  Appending  the  unit  eigenvector  rq  =  v1/||  v:  ||  to 
this  collection  will  complete  the  orthonormal  basis  of  Mn.  Q.E.D. 


Proposition  8.37.  Let  A  =  AT  be  an  n  x  n  symmetric  matrix.  Let  v1,...,vn  be  an 
orthogonal  eigenvector  basis  such  that  v1? . . . ,  vr  correspond  to  nonzero  eigenvalues,  while 
vr+1, . . . ,  vn  are  null  eigenvectors  corresponding  to  the  zero  eigenvalue  (if  any).  Then  r  — 
rank  A;  the  non- null  eigenvectors  v1? . . . ,  vr  form  an  orthogonal  basis  for  img  A  =  coimg  A, 
while  the  null  eigenvectors  vr+1, . . . ,  vn  form  an  orthogonal  basis  for  ker  A  =  coker  A. 

Proof :  The  zero  eigenspace  coincides  with  the  kernel,  V0  =  ker  A.  Thus,  the  linearly 
independent  null  eigenvectors  form  a  basis  for  ker  A,  which  has  dimension  n  —  r  where 
r  =  rank  A.  Moreover,  the  remaining  r  non- null  eigenvectors  are  orthogonal  to  the  null 
eigenvectors.  Therefore,  they  must  form  a  basis  for  the  kerneks  orthogonal  complement, 
namely  coimg  A  =  img  A.  Q.E.D. 
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Exercises 

8.5.1.  Find  the  eigenvalues  and  an  orthonormal  eigenvector  basis  for  the  following  symmetric 
matrices: 

6-4  1 


(a) 


( b ) 


(c) 


\ 

f  1 

0 

4\ 

/ 

).  (d) 

0 

1 

3 

.  (e) 

/ 

u 

3 

b 

V 

-4  6 

1  -1 


8.5.2.  Determine  whether  the  following  symmetric  matrices  are  positive  definite  by  computing 
their  eigenvalues.  Validate  your  conclusions  by  using  the  methods  from  Chapter  5. 


(a) 


(b) 


(c) 


(d) 


8.5.3.  Prove  that  a  symmetric  matrix  is  negative  definite  if  and  only  if  all  its  eigenvalues  are 
negative. 

8.5.4.  How  many  orthonormal  eigenvector  bases  does  a  symmetric  n  x  n  matrix  have? 

8.5.5.  Let  A  =  ^  d^j  '  ^  down  necessary  and  sufficient  conditions  on  the  entries 

a,  6,  c,  d  that  ensures  that  A  has  only  real  eigenvalues. 

(b)  Verify  that  all  symmetric  2x2  matrices  satisfy  your  conditions. 

(c)  Write  down  a  non-symmetric  matrix  that  satisfies  your  conditions. 

m 

T  8.5.6.  Let  A  =  —  A  be  a  real,  skew-symmetric  n  x  n  matrix,  (a)  Prove  that  the  only  possible 

real  eigenvalue  of  A  is  A  =  0.  (b)  More  generally,  prove  that  all  eigenvalues  A  of  A  are 
purely  imaginary,  i.e.,  Re  A  =  0.  (c)  Explain  why  0  is  an  eigenvalue  of  A  whenever  n  is 
odd.  (d)  Explain  why,  if  n  =  3,  the  eigenvalues  of  A  /  O  are  0,  i  ce,  —  i  ca,  for  some  real 
ca  /  0.  (e)  Verify  these  facts  for  the  particular  matrices 

/  0  0  2  0\ 

X  /II  ^  ll\  /III  -  l\ 

0  -2 
2  0 


(0 


(ii) 


V 


o 

3 

0 


3 
0 

4 


(Hi) 


\ 


0 

1 

1 


1 

0 

1 


1\ 

1 


O) 


0/ 


V 


o 

-2 

0 


0 

0 

3 


0  -3 

0  0 

0  0 ) 


T  8.5.7.  (a)  Prove  that  every  eigenvalue  of  a  Hermitian  matrix  A,  satisfying  AT  =  A  as 
in  Exercise  3.6.45,  is  real,  (b)  Show  that  the  eigenvectors  corresponding  to  distinct 
eigenvalues  are  orthogonal  under  the  Hermitian  dot  product  on  Cn.  (c)  Find  the 
eigenvalues  and  eigenvectors  of  the  following  Hermitian  matrices,  and  verify  orthogonality: 

/  0  i  0 


(n) 


(Hi) 


V 


—  1 
o 


i 

0 

i 


T  8.5.8.  Let  ib,  M  be  n  x  n  matrices,  with  M  >  0  positive  definite.  A  nonzero  vector  v  ^  0  is 
called  a  generalized  eigenvector  of  matrix  pair  7b,  M  if  it  satisfies  the  generalized  eigenvalue 
equation 

Kv  =  AMv,  v  ^  0,  (8.34) 

where  the  scalar  A  is  the  corresponding  generalized  eigenvalue.  Note  that  ordinary 
eigenvalue/eigenvectors  of  K  are  when  M  =  I. (a)  Prove  that  A  is  a  generalized  eigenvalue 
if  and  only  if  it  satisfies  the  generalized  characteristic  equation  det (K  —  AM)  =  0. 

(b)  Prove  that  A  is  a  generalized  eigenvalue  of  the  matrix  pair  K:  M  if  and  only  if  it  is  an 

ordinary  eigenvalue  of  the  matrix  M_1  K.  How  are  the  eigenvectors  related?  (c)  Now 
suppose  K  is  a  symmetric  matrix.  Prove  that  its  generalized  eigenvalues  are  all  real. 

Hint :  First  explain  why  this  does  not  follow  from  part  (a).  Instead  mimic  the  proof  of  part 

(a)  of  Theorem  8.32,  using  the  weighted  Hermitian  inner  product  ( v  ,  w  )  =  v  M  w  in 
place  of  the  dot  product,  (d)  Show  that  if  K  >  0,  then  its  generalized  eigenvalues  are  all 
positive:  A  >  0.  (e)  Prove  that  the  eigenvectors  corresponding  to  different  generalized 

m 

eigenvalues  are  orthogonal  under  the  weighted  inner  product  (v,w)  =  v  Mw.  (f)  Show 
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that,  if  the  matrix  pair  K,M  has  n  distinct  generalized  eigenvalues,  then  the  eigenvectors 
form  an  orthogonal  basis  for  IRn.  Remark.  One  can,  by  mimicking  the  proof  of  part  (c)  of 
Theorem  8.32,  show  that  this  holds  even  when  there  are  repeated  generalized  eigenvalues. 


8.5.9.  Compute  the  generalized  eigenvalues  and  eigenvectors,  as  in  (8.34),  for  the  following 
matrix  pairs.  Verify  orthogonality  of  the  eigenvectors  under  the  appropriate  inner  product. 


(b)  K 


(0  K={_\  J)-M=(_?  I)- (d) 


fl  2  0\ 

(l  1  0\ 

(e)  K  = 

2  8  2 

,M  = 

1  3  1  ,  (f )  K 

(0  2  !/ 

^0  11/ 

/  6  -8 

3  \ 

(1  0  °\ 

K  = 

-8  24 

-6  ,  M  = 

0  4  0 

\  3  — 6 

99  j 

(o  0  9/ 

(  5 

3  —5  \ 

(  3 

2  — 3\ 

3 

3  -1 

,M  = 

2 

2  -1 

V-5 

-1  9 ) 

^"3 

-1  5 ) 

0  8.5.10.  Let  L  =  L*:Rn  — >  Mn  be  a  self-adjoint  linear  transformation  with  respect  to  the 
inner  product  (•,•).  Prove  that  all  its  eigenvalues  are  real  and  the  eigenvectors  are 
orthogonal.  Hint :  Mimic  the  proof  of  Theorem  8.32,  replacing  the  dot  product  by  the 
given  inner  product. 


C  8.5.11.  The  difference  map  A:  Cn  — >  Cn  is  defined  as  A  =  S  —  I ,  where  S  is  the  shift  map 
of  Exercise  8.2.13.  (a)  Write  down  the  matrix  D  corresponding  to  A.  (b)  Prove  that  the 
sampled  exponential  vectors  u?0, . . .  ,CJn_1  from  (5.102)  form  an  eigenvector  basis  of  D. 

What  are  the  eigenvalues?  (c)  Prove  that  K  =  D  D  has  the  same  eigenvectors  as  D. 
What  are  its  eigenvalues?  (d)  Is  K  positive  definite?  (e)  According  to  Theorem  8.32 
the  eigenvectors  of  a  symmetric  matrix  are  real  and  orthogonal.  Use  this  to  explain  the 
orthogonality  of  the  sampled  exponential  vectors.  But,  why  aren’t  they  real? 


(  c0 

ci 

c2 

c3  . 

Cn—  1 

Cn—  1 

c0 

C1 

c2 

Cn  —  2 

C  8.5.12.  An  n  x  n  circulant  matrix  has  the  form  C  = 

Cn  —  2 

cn  — 1 

c0 

Ci 

•  • 

cn— 3 

v  c; 

c2 

c3 

C4 

c0 

in  which  the  entries  of  each  succeeding  row  are  obtained  by  moving  all  the  previous  row’s 
entries  one  slot  to  the  right,  the  last  entry  moving  to  the  front,  (a)  Check  that  the  shift 

rri 

matrix  A  of  Exercise  8.2.13,  the  difference  matrix  D,  and  its  symmetric  product  K  =  D  D 
of  Exercise  8.5.11  are  all  circulant  matrices,  (b)  Prove  that  the  sampled  exponential  vectors 
cJq,  . . . ,  cf.  (5.102),  are  eigenvectors  of  C.  Thus,  all  circulant  matrices  have  the  same 

eigenvectors !  What  are  the  eigenvalues?  (c)  Prove  that  F~1C  Fn  =  A,  where  Fn  is  the 
Fourier  matrix  in  Exercise  5.6.9  and  A  is  the  diagonal  matrix  with  the  eigenvalues  of  C 

along  the  diagonal,  (d)  Find  the  eigenvalues  and  eigenvectors  of  the  following  circulant 
matrices: 


(0 


(1 

3 

\2 


2 

1 

3 


(m) 


/  1 

-1 

-1 

n 

(  2 

-1 

0 

-n 

1 

1 

-1 

-i 

,  (iv) 

-1 

2 

-1 

0 

-1 

1 

1 

-i 

0 

-1 

2 

-1 

W 

-1 

1 

i) 

w 

0 

-1 

2  / 

(e)  Find  the  eigenvalues  of  the  tricirculant  matrices  in  Exercise  1.7.13.  Can  you 
find  a  general  formula  for  the  n  x  n  version?  Explain  why  the  eigenvalues  must  be 

real  and  positive.  Does  your  formula  reflect  this  fact?  (f)  Which  of  the  preceding 
matrices  are  invertible?  Write  down  a  general  criterion  for  checking  the  invertibility 
of  circulant  matrices. 
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The  Spectral  Theorem 

Every  real,  symmetric  matrix  admits  an  eigenvector  basis,  and  hence  is  diagonalizable. 
Moreover,  since  we  can  choose  eigenvectors  that  form  an  orthonormal  basis,  the  diagonal¬ 
izing  matrix  takes  a  particularly  simple  form.  Recall  that  an  n  x  n  matrix  Q  is  orthogonal 
if  and  only  if  its  columns  form  an  orthonormal  basis  of  Mn.  Alternatively,  one  characterizes 
orthogonal  matrices  by  the  condition  Q-1  =  QT,  as  in  Definition  4.18. 

Using  the  orthonormal  eigenvector  basis  in  the  diagonalization  formula  (8.30)  results  in 
what  is  known  as  the  spectral  factorization  of  a  symmetric  matrix. 

Theorem  8.38.  Let  A  be  a  real,  symmetric  matrix.  Then  there  exists  an  orthogonal 
matrix  Q  such  that 

A  =  QAQ-1  =  QAQt,  (8.35) 

where  A  is  a  real  diagonal  matrix.  The  eigenvalues  of  A  appear  on  the  diagonal  of  A,  while 
the  columns  of  Q  are  the  corresponding  orthonormal  eigenvectors. 


Remark.  The  term  “spectrum”  refers  to  the  eigenvalues  of  a  matrix,  or,  more  generally, 
a  linear  operator.  The  terminology  is  motivated  by  physics.  The  spectral  energy  lines  of 
atoms,  molecules,  and  nuclei  are  characterized  as  the  eigenvalues  of  the  governing  quantum 
mechanical  Schrodinger  operator,  [54].  The  Spectral  Theorem  8.38  is  the  finite-dimensional 
version  of  the  decomposition  of  quantum  mechanical  linear  operators  into  their  spectral 
eigenstates. 

Warning.  Although  both  involve  diagonal  matrices,  the  spectral  factorization  A  —  Q  A  QT 
and  the  Gaussian  factorization  A  =  LDLT  of  a  regular  symmetric  matrix,  cf.  (1.58),  are 
completely  different.  In  particular,  the  eigenvalues  are  not  the  pivots,  so  A  ^  D. 

The  spectral  factorization  (8.35)  provides  us  with  an  alternative  means  of  diagonalizing 
the  associated  quadratic  form  q(x)  =  xTAx,  i.e.,  of  completing  the  square.  We  write 

n 

<?(x)  =  XT  A  X  =  xT  Q  A  Qt  X  =  yTA  y  =  ^  (8-36) 

i  =  1 

where  the  entries  of  y  =  QTx  —  Q_1x  are  the  coordinates  of  x  with  respect  to  the 
orthonormal  eigenvector  basis  of  A.  In  particular,  q(x)  >  0  for  all  x  ^  0  and  so  A  is  positive 
definite  if  and  only  if  each  eigenvalue  Ai  is  strictly  positive,  reconfirming  Theorem  8.35. 


Example  8.39.  For  the  2x2  matrix  A  = 


3 

1 


1 

3 


considered  in  Example  8.33,  the 

/  J_ 

orthonormal  eigenvectors  produce  the  diagonalizing  orthogonal  matrix  Q  —  I  ^ 

The  reader  can  validate  the  resulting  spectral  factorization:  V  U3 


3 

1 


1 

3 


=  A  =  QAQT  = 


1 

U2 

1 

a/2 


According  to  formula  (8.36),  the  associated  quadratic  form  is  diagonalized  as 


72  / 


<?(x)  =  3  x\  +  2x1x2  +  Zx\  =  4  y\  +  2  y\. 


xx  +  x2 


-x1-\-x2 

A 


where  y  =  QTx ,  i.e.,  y1 


y2  = 
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Figure  8.3.  Stretching  a  Circle  into  an  Ellipse. 


You  can  always  choose  Q  to  be  a  proper  orthogonal  matrix,  so  detQ  =  1,  since  an 
improper  orthogonal  matrix  can  be  made  proper  by  multiplying  one  of  its  columns  by  —  1, 
which  does  not  affect  its  status  as  an  eigenvector  matrix.  Since  a  proper  orthogonal  matrix 
Q  represents  a  rigid  rotation  of  Mn,  the  diagonalization  of  a  symmetric  matrix  can  be  inter¬ 
preted  as  a  rotation  of  the  coordinate  system  that  makes  the  orthogonal  eigenvectors  line 
up  along  the  coordinate  axes.  Therefore,  a  linear  transformation  L[x]  =  Ax  represented 
by  a  positive  definite  matrix  A  can  be  regarded  as  a  combination  of  stretches  in  n  mutually 
orthogonal  directions.  A  good  way  to  visualize  this  is  to  consider  the  effect  of  the  linear 
transformation  on  the  unit  (Euclidean)  sphere  £'1={||x||  =  l}.  Stretching  the  sphere  in 
mutually  orthogonal  directions  will  map  it  to  an  ellipsoid  E  —  LfS'-J  =  {Ax  |  ||x||  =  1} 
whose  principal  axes  are  aligned  with  the  directions  of  stretch;  see  Figure  8.3  for  the 
two-dimensional  case.  For  instance,  in  elasticity,  the  stress  tensor  of  a  deformed  body 
is  represented  by  a  positive  definite  matrix.  Its  eigenvalues  are  known  as  the  principal 
stretches  and  its  eigenvectors  the  principal  directions  of  the  elastic  deformation. 


Exercises 


8.5.13.  Write  out  the  spectral  factorization  of  the  following  matrices: 

1  1 


(a) 


3  4 

4  3 


(*>) 


2 

1 


■1 

4 


(c) 


2 

1 


0\ 

(  3 

-1 

~i\ 

1 

.  (d) 

-1 

2 

0  . 

1/ 

^-1 

0 

2/ 

8.5.14.  Write  out  the  spectral  factorization  of  the  matrices  listed  in  Exercise  8.5.1. 


8.5.15.  Construct  a  symmetric  matrix  with  the  following  eigenvectors  and  eigenvalues,  or 

T  _  /  a  q \T 


explain  why  none  exists:  (a)  \1  =  1,  vi  =  (  f  5  f  ) 

(b)  Aj  =  -2,  Vl  =  (l,-1)T,  A2  =  1,  v2  =  (1,1)T, 
1,  v2  =  (-1,2)T,  (d)  A,  =2,  Vl  =  (2,1)T 


4  3 

5  ’  5 


a2  — 


^2  =  3>  v2  = 

(c)  Ax  =3,  vx  =  (2,  -1)T 
A2  =  2,  v2  =  ( 1,  2  )T  . 


T  8.5.16.  (a)  Find  the  eigenvalues  and  eigenvectors  of  the  matrix  A  = 


2  1-1 
12  1 

v  —1  1  2 

(b)  Use  the  eigenvalues  to  compute  the  determinant  of  A.  (c)  Is  A  positive  definite?  Why 

Q 

or  why  not?  (d)  Find  an  orthonormal  eigenvector  basis  of  M  determined  by  A  or  explain 
why  none  exists,  (e)  Write  out  the  spectral  factorization  of  A  if  possible,  (f)  Use 
orthogonality  to  write  the  vector  ( 1,  0,  0  )T  as  a  linear  combination  of  eigenvectors  of  A. 


8.5.17.  Use  the  spectral  factorization  to  diagonalize  the  following  quadratic  forms: 
(a)  x2  —  3xy-h5y2,  ( b )  3x2  +  4xy  +  6y2,  (c)  x2  +  8xz  +  y2  +  6yz  +  z2 : 

(d)  | x2  —  xy  —  xz  +  y2  +  z2 ,  (e)  6x2  —  8xy  +  2xz  +  6y2  —  2yz  +  11  z2 . 
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0  8.5.18.  Let  ul5 . . . ,  un  be  an  orthonormal  basis  of  IRn.  Prove  that  it  forms  an  eigenvector  basis 
for  some  symmetric  n  x  n  matrix  A.  Can  you  characterize  all  such  matrices? 


8.5.19.  True  or  false:  A  matrix  with  a  real  orthonormal  eigenvector  basis  is  symmetric. 


'T' 

8.5.20.  Prove  that  every  quadratic  form  can  be  written  as  x  Ax 


where  Xi  are  the  eigenvalues  of  A  and  0i 
eigenvector. 

8.5.21.  An  elastic  body  has  stress  tensor  T  - 
principal  directions  of  stretch. 


$  (x> v; 

3  1 

1  3 

2  1 


)  denotes  the  angle  between  x  and  the 


A 

1  .  Find  the  principal  stretches  and 

3/ 


0  8.5.22.  Given  a  solid  body  spinning  around  its  center  of  mass,  the  eigenvectors  of  its  positive 
definite  inertia  tensor  prescribe  three  mutually  orthogonal  principal  directions  of  rotation, 
while  the  corresponding  eigenvalues  are  the  moments  of  inertia.  Given  the  inertia  tensor 


(  2 

1 

0\ 

T  = 

1 

3 

1 

^0 

1 

2/ 

find  the  principal  directions  and  moments  of  inertia. 


0  8.5.23.  Let  if  be  a  positive  definite  2x2  matrix,  (a)  Explain  why  the  quadratic  equation 

xTiLx  =  1  defines  an  ellipse.  Prove  that  its  principal  axes  are  the  eigenvectors  of  K ,  and 
the  semi-axes  are  the  reciprocals  of  the  square  roots  of  the  eigenvalues. 

(b)  Graph  and  describe  the  following  curves: 

(i)  x2T4y2  =  l,  (ii)  x2  +  xy-\-y2  =  1,  (in)  3x2  +  2xy  +  y2  =  1. 

(c)  What  sort  of  curve(s)  does  xTiLx  =  1  describe  if  K  is  not  positive  definite? 


0  8.5.24.  Let  if  be  a  positive  definite  3x3  matrix,  (a)  Prove  that  the  quadratic  equation 
xTiLx  =  1  defines  an  ellipsoid  in  IR3.  What  are  its  principal  axes  and  semi-axes? 

O  O  o 

(b)  Describe  the  surface  defined  by  the  equation  11  x  —  8xy+20y  —  lOx z  +  8y z  +  11  z  =  1. 


'T' 

8.5.25.  Prove  that  A  =  A  has  a  repeated  eigenvalue  if  and  only  if  it  commutes,  A  J  =  JA, 

with  a  nonzero  skew-symmetric  matrix:  JT  =  —  J  /  O. 

Hint :  First  prove  this  when  A  is  a  diagonal  matrix. 


8.5.26.  Find  all  positive  definite  orthogonal  matrices. 


0  8.5.27.  (a)  Prove  that  every  positive  definite  matrix  K  has  a  unique  positive  definite  square 
root ,  i.e.,  a  matrix  B  >  0  satisfying  B2  =  K. 

(b)  Find  the  positive  definite  square  roots  of  the  following  matrices: 


(*)  (i  2)’  (**)  (-1  1)’  (***) 


(2 

0 

°\ 

6 

-4 

i\ 

0 

5 

0 

,  (iv) 

-4 

6 

-1 

Vo 

0 

9  / 

\ 

1 

-1 

11/ 

0  8.5.28.  The  Polar  Decomposition :  Prove  that  every  invertible  matrix  A  has  a  polar 

decomposition ,  written  A  =  Q  B,  into  the  product  of  an  orthogonal  matrix  Q  and  a 
positive  definite  matrix  5  ^  0.  Show  that  if  det  4  >  0,  then  Q  is  a  proper  orthogonal 
matrix.  Hint :  Look  at  the  Gram  matrix  K  =  A  A  and  use  Exercise  8.5.27. 

Remark.  In  mechanics,  if  A  represents  the  deformation  of  a  body,  then  Q  represents  a 
rotation,  while  B  represents  a  stretching  along  the  orthogonal  eigendirections  of  K.  Thus, 
every  linear  deformation  of  an  elastic  body  can  be  decomposed  into  a  pure  stretching 
transformation  followed  by  a  rotation. 


8.5.29.  Find  the  polar  decompositions  A  =  Q  B,  as  defined  in  Exercise  8.5.28,  of  the  following 


matrices: 

<*)  ("  j)’  w  (1 1).  <«>  (j 


\ 

(0 

-3 

8\ 

(  1 

0 

i\ 

)>  (d) 

1 

0 

°  , 

(e) 

1 

-2 

0 

\o 

4 

6/ 

Vi 

1 

OJ 
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0  8.5.30.  The  Spectral  Theorem  for  Hermitian  Matrices.  Prove  that  a  complex  Hermitian  matrix 
can  be  factored  as  H  =  U  A  where  U  is  a  unitary  matrix  and  A  is  a  real  diagonal  matrix. 
Hint:  See  Exercises  4.3.25,  8.5.7. 


8.5.31.  Find  the  spectral  factorization,  as  in  Exercise  8.5.30,  of  the  following  Hermitian  matrices 

f  -1  5  i  -4 


( b ) 


2  i 


(c) 


V 


-5  i 
-4 


-1 
— 4i 


4i 


T  8.5.32.  The  Spectral  Decomposition :  Let  A  be  a  symmetric  matrix  with  distinct  eigenvalues 

Al5 . . . ,  Xk.  Let  Vj  =  ker  (A  —  A  •  I )  denote  the  eigenspace  corresponding  to  A^,  and  let  P-  be 
the  orthogonal  projection  matrix  onto  Vj,  as  defined  in  Exercise  4.4.9.  (i)  Prove  that  the 
spectral  factorization  (8.35)  can  be  rewritten  as 


A  —  \1P1+\2P2+  •••  +\kPk,  (8.37) 

expressing  A  as  a  linear  combination  of  projection  matrices,  (ii)  Write  out  the  spectral 
decomposition  (8.37)  for  the  matrices  in  Exercise  8.5.13.  (in)  Show  that 

I  =  Pi  +  P2  +  •  •  •  +  Pki  while  Pf  =  Pi,  P{Pj  =0  for  i  /  j. 

(iv)  Show  that  if  p(t)  is  any  polynomial,  then 


P(A)  =p(X1)P1  +p(A2)P2  +  •••  +p(Xk)Pk.  (8.38) 

Remark.  Replacing  p(t)  by  any  function  f(t)  allows  one  to  define  /(A)  for  any  symmetric 
matrix  A. 


Optimization  Principles  for  Eigenvalues  of  Symmetric  Matrices 

As  we  learned  in  Chapter  5,  the  solution  to  a  linear  system  with  positive  definite  coefficient 
matrix  can  be  characterized  by  a  minimization  principle.  Thus,  it  should  come  as  no 
surprise  that  eigenvalues  of  positive  definite  matrices,  and  even  more  general  symmetric 
matrices,  can  also  be  characterized  by  some  sort  of  optimization  procedure.  A  number 
of  basic  numerical  algorithms  for  computing  eigenvalues  of  matrices  are  based  on  such 
optimization  principles. 

First,  consider  the  relatively  simple  case  of  a  real  diagonal  matrix  A  =  diag  (A1? . . . ,  An). 
We  assume  that  the  diagonal  entries,  which  are  the  same  as  the  eigenvalues,  appear  in 
decreasing  order, 

Ax  >  A2  >  •  •  •  >  A„,  (8.39) 

so  A:  is  the  largest  eigenvalue,  while  An  is  the  smallest.  The  effect  of  A  on  a  vector 

T 

y  —  (y1,y2, ...  ,yn)  £  Mn  is  to  multiply  its  entries  by  the  diagonal  eigenvalues:  Ay  = 

(  A:  y1,  A2  y2, . . . ,  An  yn  )  .  In  other  words,  the  linear  transformation  represented  by  the 
coefficient  matrix  A  has  the  effect  of  stretching^  the  zth  coordinate  direction  by  the  factor 
A^.  In  particular,  the  maximal  stretch  occurs  in  the  e:  direction,  with  factor  Al5  while 
the  minimal  (or  largest  negative)  stretch  occurs  in  the  en  direction,  with  factor  Xn.  The 
germ  of  the  optimization  principles  for  characterizing  the  extreme  eigenvalues  is  contained 
in  this  geometrical  observation. 

Let  us  turn  our  attention  to  the  associated  quadratic  form 

q(y)  =  yTAy  =  \vl  +  \vl  +  •••  +Kvl-  (8-4°) 


T  If  A-  0,  then  the  effect  is  to  stretch  and  reflect. 
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Note  that  qite^  —  A lt2^  and  hence  if  A:  >  o,  then  q(y)  has  no  maximum;  on  the  other 
hand,  if  A:  <  0,  so  all  eigenvalues  are  non-positive,  then  q( y)  <  0  for  all  y,  and  its 
maximal  value  is  q( 0)  =  0.  Thus,  in  either  case,  a  strict  maximization  of  q(y)  does  not  tell 
us  anything  of  importance. 

Suppose,  however,  that  we  continue  in  our  quest  to  maximize  q( y),  but  now  restrict  y 
to  be  a  unit  vector  (in  the  Euclidean  norm),  so 


In  view  of  (8.39), 

q( y)  =  Ax y\  +  A2 y\  +  •  •  •  +  Xn y2n  <  Ax y\-\-X1 y%  +  *  *  *  +  A: y2n  =  Ai  ( y\  +  •  •  •  +  y2n  )  =  Xx. 

Moreover,  q(e1)  =  Ar  We  conclude  that  the  maximal  value  of  q( y)  over  all  unit  vectors  is 
the  largest  eigenvalue  of  A: 


Ai  =  max  {  q(y) 


By  the  same  reasoning,  its  minimal  value  equals  the  smallest  eigenvalue: 


An  =  min  {  q(y) 


Thus,  we  can  characterize  the  two  extreme  eigenvalues  by  optimization  principles,  albeit 
of  a  slightly  different  character  from  what  we  dealt  with  in  Chapter  5. 

Now  suppose  A  is  any  symmetric  matrix.  We  use  its  spectral  factorization  (8.35)  to 
diagonalize  the  associated  quadratic  form 

^(x)  =xT4x  =  xTQAQTx  =  yTAy,  where  y  =  QTx  =  Q-1x, 


as  in  (8.36).  According  to  the  preceding  discussion,  the  maximum  of  yTAy  over  all  unit 
vectors  ||  y  ||  =  1  is  the  largest  eigenvalue  A:  of  A,  which  is  the  same  as  the  largest  eigenvalue 
of  A.  Moreover,  since  Q  is  an  orthogonal  matrix,  Proposition  7.24  tell  us  that  it  maps  unit 
vectors  to  unit  vectors: 


1 


5 


and  so  the  maximum  of  q(x)  over  all  unit  vectors  ||  x  ||  =  1  is  the  same  maximum  eigenvalue 
X1.  Similar  reasoning  applies  to  the  smallest  eigenvalue  Xn.  In  this  fashion,  we  have 
established  the  basic  optimization  principles  for  the  extreme  eigenvalues  of  a  symmetric 
matrix. 


Theorem  8.40.  Let  A  be  a  symmetric  matrix  with  real  eigenvalues  A:  >  A2  >  •  •  •  >  An. 
Then 


A 


l 


max 


min 


(8.41) 


are,  respectively  its  largest  and  smallest  eigenvalues.  The  maximal  value  is  achieved  when 
x  =  Trq  is  one  of  the  unit  eigenvectors  associated  with  the  largest  eigenvalue  Ax;  similarly, 
the  minimal  value  occurs  at  x  =  ±un,  a  unit  eigenvector  for  the  smallest  eigenvalue  An. 


Remark.  In  multivariable  calculus,  the  eigenvalue  A  plays  the  role  of  a  Lagrange  multiplier 
for  the  constrained  optimization  problem,  [2,  78,  79]. 

Example  8.41.  The  problem  is  to  maximize  the  value  of  the  quadratic  form 

q(x,  y )  =  3x2  +  2xy  +  3 y2 
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for  all  x,  y  lying  on  the  unit  circle  x2  +  y2  —  1.  This  maximization  problem  is  precisely  of 

the  form  (8.41).  The  symmetric  coefficient  matrix  for  the  quadratic  form  is  A  = 

whose  eigenvalues  are,  according  to  Example  8.5,  A:  =  4  and  A2  =  2.  Theorem  8.40  implies 
that  the  maximum  is  the  largest  eigenvalue,  and  hence  equal  to  4,  while  its  minimum  is  the 
smallest  eigenvalue,  and  hence  equal  to  2.  Thus,  evaluating  q(x,  y)  on  the  unit  eigenvectors, 
we  conclude  that 

=2^^a;’^^4  =  9(~75’7f)  fora11  x2  +  y2  =  l. 

In  practical  applications,  the  restriction  of  the  quadratic  form  to  unit  vectors  may  not  be 
particularly  convenient.  We  can,  however,  rephrase  the  eigenvalue  optimization  principles 
in  a  form  that  utilizes  general  nonzero  vectors.  If  v  ^  0,  then  x  =  v/||  v  ||  is  a  unit  vector. 
Substituting  this  expression  for  x  in  the  quadratic  form  q(x)  =  xT A  x  leads  to  the  following 
optimization  principles  for  the  extreme  eigenvalues  of  a  symmetric  matrix: 


Ax  =  max 


vT4v 


v  ^  0 


A^  —  min 

/  L 


vffiv 


V  ^  0 


(8.42) 


Thus,  we  replace  optimization  of  a  quadratic  polynomial  over  the  unit  sphere  by  optimiza¬ 
tion  of  a  rational  function  over  all  of  Mn\{0}.  The  rational  function  being  optimized  is 
called  a  Rayleigh  quotient ,  after  Lord  Rayleigh,  a  prominent  nineteenth-century  British 
scientist.  For  instance,  referring  back  to  Example  8.41,  the  maximum  value  of 


r(x,y)  = 


3x2  +  2xy  +  3y‘ 
x2  +  y2 


for  all 


x 

V 


is  equal  to  4,  the  same  maximal  eigenvalue  of  the  corresponding  coefficient  matrix. 

What  about  characterizing  one  of  the  intermediate  eigenvalues?  Then  we  need  to  be 
a  little  more  sophisticated  in  designing  the  optimization  principle.  To  motivate  the  con¬ 
struction,  look  first  at  the  diagonal  case.  If  we  restrict  the  quadratic  form  (8.40)  to  vectors 
y  =  ( 0,  y2,  •  •  • ,  yn  )  whose  first  component  is  zero,  we  obtain 

q{y)  =  q{o,y2,  ■  ■  -,yn)  =  \v\  +  •••  +  Kvl- 


The  maximum  value  of  q(  y)  over  all  such  y  of  norm  1  is,  by  the  same  reasoning,  the 
second  largest  eigenvalue  A2.  Moreover,  y  •  e:  =  0,  and  so  y  can  be  characterized  as  being 
orthogonal  to  the  first  standard  basis  vector,  which  also  happens  to  be  the  eigenvector 
of  A  corresponding  to  the  maximal  eigenvalue  Ar  Thus,  to  find  the  second  eigenvalue, 
we  maximize  the  quadratic  form  over  all  unit  vectors  that  are  orthogonal  to  the  first 
eigenvector.  Similarly,  if  we  want  to  find  the  jth  largest  eigenvalue  A  -,  we  maximize  q(  y) 
over  all  unit  vectors  y  whose  first  j  —  l  components  vanish,  yx  —  •  •  •  =  Vj-i  —  0,  or,  stated 

geometrically,  over  all  vectors  y  such  that  ||  y  ||  =  1  and  y  *  ei  =  *  *  •  =  y  ‘  ej-i  =  0,  that  is, 
over  all  unit  vectors  orthogonal  to  the  first  j  —  1  eigenvectors  of  A. 

A  similar  reasoning  based  on  the  Spectral  Theorem  8.38  and  the  orthogonality  of  eigen¬ 
vectors  of  symmetric  matrices  leads  to  the  general  result. 


Theorem  8.42.  Let  A  be  a  symmetric  matrix  with  eigenvalues  X1  >  A2  >  •  "  >  Xn  and 
corresponding  orthogonal  eigenvectors  v1? . . . ,  vn.  Then  the  maximal  value  of  the  quadratic 
form  q(x)  =  xT Ax  over  all  unit  vectors  that  are  orthogonal  to  the  first  j  —  1  eigenvectors 
is  its  jth  eigenvalue: 


max 


{  xT Ax 


x 


=  1,  X-V1  = 


•  =  X  •  V  ■_!  =  0  }  . 


(8.43) 


8.5  Eigenvalues  of  Symmetric  Matrices 
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Thus,  at  least  in  principle,  one  can  compute  the  eigenvalues  and  eigenvectors  of  a 
symmetric  matrix  by  the  following  recursive  procedure.  First,  find  the  largest  eigenvalue 
A:  by  the  basic  maximization  principle  (8.41)  and  its  associated  eigenvector  v:  by  solving 
the  eigenvector  system  (8.13).  The  next  largest  eigenvalue  A2  is  then  characterized  by  the 
constrained  maximization  principle  (8.43),  and  so  on.  Although  of  theoretical  interest,  this 
algorithm  is  of  somewhat  limited  value  in  numerical  computations.  Practical  numerical 
methods  for  computing  eigenvalues  will  be  developed  in  Sections  9.5  and  9.6. 


Exercises 


o  9 

8.5.33.  Find  the  minimum  and  maximum  values  of  the  quadratic  form  5x  +4xy  +  5y  where 

99 

x,y  are  subject  to  the  constraint  x  +  y  =1. 


8.5.34.  Find  the  minimum  and  maximum  values  of  the  quadratic  form 

9  99  999 

2x  +  xy  +  2x z  +  2y  +2 z  where  x,  y,  z  are  required  to  satisfy  x  +  z  =1. 

8.5.35.  What  is  the  minimum  and  maximum  values  of  the  following  rational  functions: 

,  N  3x2  —  2 y2  ,  x  x2  —  3 xy-\-y2  ,  x  3 x2  +  x y  +  5 y2  ,  1X  2x2  +  xy  +  3xz  +  2y2  +  2z2 

(a)  o  ,  J  ,  (b  -  .  .  ,  (c)  - 'o  ,y.o  ,  d)  - 


x2  +  y 2 


x2  +  y 2 


x2  +  y 2 


x2  +  yz  +  z' 


n  — 1 


8.5.36.  Find  the  minimum  and  maximum  values  of  <?(x)  =  ^  x^x^+i  for 


x 


=  1. 


Hint :  See  Exercise  8.2.47. 


z  =  l 


8.5.37.  Write  down  and  solve  an  optimization  principle  characterizing  the  largest  and  smallest 
eigenvalues  of  the  following  positive  definite  matrices: 

(  6  -4  1\  /  4  -1  —2  \ 


(a) 


2 

-1 


1 

3 


(b) 


4  1 
1  4 


(c) 


V 


-4 

1 


6  -1 

1  11/ 


4  -1 

1  4/ 


8.5.38.  Write  down  a  maximization  principle  that  characterizes  the  middle  eigenvalue  of  the 
matrices  in  parts  (c)  and  (d)  of  Exercise  8.5.37. 

8.5.39.  Given  a  3  x  3  symmetric  matrix,  formulate  two  distinct  ways  of  characterizing  its 
middle  eigenvalue  A2. 

m 

8.5.40.  Suppose  K  >  0.  What  is  the  maximum  value  of  q(x)  =  x  Ax  when  x  is  constrained  to 


a  sphere  of  radius 


x 


_  r? 


8.5.41.  Let  K  >  0.  Prove  the  product  formula 


max  |  xTA  x 


x 


1  |  min  |  xT  K  1x  ||  x  ||  =  1  J  =  1. 


0  8.5.42.  Write  out  the  details  in  the  proof  of  Theorem  8.42. 

0  8.5.43.  Reformulate  Theorem  8.42  as  a  minimum  principle  for  intermediate  eigenvalues. 
8.5.44.  Under  the  set-up  of  Theorem  8.42,  explain  why 

v/0,  v-Vj  =■■■  =v-v-_1  =0 


A  ■  =  max 


Jv 

tKv 

1 1 

V 

2 

T  8.5.45.  (a)  Let  A,  M  be  positive  definite  n  x  n  matrices  and  A 1  >  •  •  •  >  An  be  their  general¬ 
ized  eigenvalues,  as  in  Exercise  8.5.9.  Prove  that  that  the  largest  generalized  eigenvalue 
can  be  characterized  by  the  maximum  principle  A1  =  max{  x  Ax  |  x  Mx  =  1}. 

Hint :  Use  Exercise  8.5.27.  (b)  Prove  the  alternative  maximum  principle  A1  = 

xtAx 

x  ^  0  (c)  How  would  you  characterize  the  smallest  generalized 


max 


xTMx 

eigenvalue?  (d)  An  intermediate  generalized  eigenvalue? 
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8.5.46.  Use  Exercise  8.5.45  to  find  the  minimum  and  maximum  of  the  rational  functions 

,  .  3x  2  y  ,  .  ^  ^  y  2  y  ,  ,  2  x  ~h  3  y  z  ,  ^  ^  ^  H-  6  x  y  4-  1 1  y  -\~  y  z  2  z 

4x2  +  5 y2  1  2x2  —  xy  +  y21  x2  +  3y2  +  2  z2  1  x2  +  2xy  +  3y2  +  2y  z  +  z2 

8.5.47.  Let  A  be  a  complete  square  matrix,  not  necessarily  symmetric,  with  all  positive 

rri 

eigenvalues.  Is  the  associated  quadratic  form  g(x)  =  x  Ax  >  0  for  all  x  /  0? 


8.6  Incomplete  Matrices 


Unfortunately,  not  all  square  matrices  are  complete.  Matrices  that  do  not  have  enough 
(complex)  eigenvectors  to  form  a  basis  are  considerably  less  pleasant  to  work  with.  How¬ 
ever,  since  they  occasionally  appear  in  applications,  it  is  worth  learning  how  to  handle  them. 
There  are  two  approaches:  the  first,  named  after  the  twentieth-century  Russian/German 
mathematician  Issai  Schur,  is  a  generalization  of  the  spectral  theorem,  and  converts  an 
arbitrary  square  matrix  into  a  similar,  upper  triangular  matrix  with  the  eigenvalues  along 
the  diagonal.  Thus,  although  not  every  matrix  can  be  diagonalized,  they  can  all  be  “trian- 
gularized” .  Applications  of  the  Schur  decomposition,  including  the  numerical  computation 
of  eigenvalues,  can  be  found  in  [21 


The  second  approach,  due  to  the  nineteenth-century  French  mathematician  Camille 
Jordan,’!'  shows  how  to  supplement  the  eigenvectors  of  an  incomplete  matrix  in  order  to 
obtain  a  basis  in  which  the  matrix  assumes  a  simple,  but  now  non-diagonal,  canonical 
(meaning  distinguished)  form.  Applications  of  the  Jordan  canonical  form  will  appear  in 
our  study  of  linear  systems  of  ordinary  differential  equations.  We  remark  that  the  two 
subsections  are  completely  independent  of  one  another. 


The  Schur  Decomposition 


As  noted  above,  the  Schur  decomposition  is  used  to  convert  a  square  matrix  into  similar 
upper  triangular  matrix.  The  similarity  transformation  can  be  chosen  to  be  represented 
by  a  unitary  matrix  —  a  complex  generalization  of  an  orthogonal  matrix. 

Definition  8.43.  A  complex,  square  matrix  U  is  called  unitary  if  it  satisfies 

[/t[/=I,  where  JV+  =  TF  (8.44) 

denotes  the  Hermitian  transpose ,  in  which  one  first  transposes  and  then  takes  complex 
conjugates  of  all  entries. 

Thus,  U  is  unitary  if  and  only  if  its  inverse  equals  its  Hermitian  transpose:  U_1  =  . 

is  unitary,  since  U~x  —  — 

The  (i,  j)  entry  of  the  defining  equation  (8.44)  is  the  Hermitian  dot  product  between 
the  ith  and  jth  columns  of  U,  and  hence  U  is  an  n  x  n  unitary  matrix  if  and  only  if  its 
columns  form  an  orthonormal  basis  of  Cn.  In  particular,  a  real  matrix  is  unitary  if  and 
only  if  it  is  orthogonal.  The  next  result  is  proved  in  the  same  fashion  as  Proposition  4.23. 


For  example,  U  — 


No  relation  to  Wilhelm  Jordan  of  Gauss-Jordan  fame. 


8.6  Incomplete  Matrices 
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Proposition  8.44.  If  U1  and  U2  are  n  x  n  unitary  matrices,  so  is  their  product  U1U2- 

The  Schur  decomposition  states  that  every  square  matrix  is  unitarily  similar  to  an  upper 
triangular  matrix.  The  method  of  proof  provides  a  recursive  algorithm  for  constructing 
the  decomposition. 

Theorem  8.45.  Let  A  be  an  n  x  n  matrix,  either  real  or  complex.  Then  there  exists  a 
unitary  matrix  U  and  an  upper  triangular  matrix  A  such  that 

A  =  UAUi  =  [/Ar1.  (8.45) 


The  diagonal  entries  of  A  are  the  eigenvalues  of  A. 


In  particular,  if  all  eigenvalues  of  A  are  real,  then  U  —  Q  can  be  chosen  to  be  a  (real) 
orthogonal  matrix,  and  A  is  also  a  real  matrix.  This  follows  from  the  construction  outlined 
in  the  proof. 

Warning.  The  Schur  decomposition  (8.45)  is  not  unique.  As  the  method  of  proof  makes 
clear,  there  are  many  inequivalent  choices  for  the  matrices  V  and  A. 

Proof  of  Theorem  8.45'.  The  proof  proceeds  by  induction  on  the  size  of  A.  According 
to  Theorem  8.11,  A  has  at  least  one,  possibly  complex,  eigenvalue  X±.  Let  rq  E  Cn 
be  a  corresponding  unit  eigenvector,  so  its  Hermitian  norm  is  ||  ux  ||  =  1.  Let  U1  be  an 
n  x  n  unitary  matrix  whose  first  column  is  the  unit  eigenvector  rq.  In  practice,  U1  can  be 
constructed  by  applying  the  Gram-Schmidt  process  to  any  basis  of  Cn  whose  first  element 
is  the  eigenvector  ur  The  eigenvector  equation  Arq  —  X1  rq  forms  the  first  column  of  the 
matrix  product  equation 


AU1  =  UXB , 


where 


B  =  U\MJ1  =  (^ 


with  C  an  (n  —  1)  x  (n  —  1)  matrix  and  r  a  row  vector.  By  our  induction  hypothesis, 
there  is  an  (n  —  1)  x  (n  —  1)  unitary  matrix  V  such  that  V^CV  =  T  is  an  upper  triangular 

(n  —  1)  x  (n  —  1)  matrix.  Set  U2  =  It  is  easily  checked  that  U2  is  also  unitary, 

and,  moreover, 

uIbu2=(^  r)=A 

is  upper  triangular.  The  unitary  product  matrix  U  =  U1U2  yields  the  desired  result: 


U]AU  =  {UlU2)^AU1U2  =  UlU{AUlU2  =  W2BU2  =  A, 

which  establishes  the  Schur  decomposition  formula  (8.45).  Finally,  since  A  and  A  are 
similar  matrices,  they  have  the  same  eigenvalues,  which  justifies  the  final  statement  of  the 
theorem.  Q.E.D. 


Example  8.46.  The  matrix  A 


has  a  simple  eigenvalue  =  2,  with 


T 

eigenvector  v:  =  (— 1,1,0)  ,  and  a  double  eigenvalue  A2  =  0,  with  only  one  independent 

eigenvector  v2  =  ( 1,  0,  2  )  .  Thus  A  is  incomplete,  and  so  not  diagonalizable.  To  construct 
a  Schur  decomposition,  we  begin  with  the  first  eigenvector  v:  and  apply  the  Gram-Schmidt 


446 


8  Eigenvalues  and  Singular  Values 


V2  V2  0 

process  to  the  basis  v1,e2,e3  to  obtain  the  orthogonal  matrix  U1  —  (  A=  A=  0  )  • 

0  0  1 

has  its  first  column  in 


/  2 
0 


75  \ 
2 


9  _ L_ 

Z  V2 


The  resulting  similar  matrix^  B  —  Uf  AU1  = 

\0  4 a/2  -2  / 

upper  triangular  form.  To  continue  the  procedure,  we  extract  the  lower  2x2  submatrix 
C  =  ^2  /  ’  an<^  ^  ^aS  a  s*n^e  (incomP^e)  eigenvalue  0,  with  unit 


eigenvector  I  I  •  The  corresponding  orthogonal  matrix  V  —  I  2^ 


2V2 

3 

1 

3 


will  con¬ 


vert  C  to  upper  triangular  form  VTCV  = 


0  -%  . 

] .  Therefore,  U2  = 

0  0 


A  =  UlBU2  =  = 


-T 


(2  2 _ 

Z  3  372 

0  0  * 

Vo  0  0  j 


where  U  —  U^U^  — 


\ 


(1 

0 

0  ^ 

0 

1 

3 

272 

3 

0  2v^ 
\u  3 

form 

-§/ 

1 

1 

2  \ 

372 

3 

1 

1 

2 

V2 

372 

3 

0 

272 

3 

is  the  desired  orthogonal  (unitary)  matrix.  Use  of  a  computer  to  carry  out  the  detailed 
calculations  is  essential  in  most  examples. 


Exercises 

8.6.1.  Establish  a  Schur  Decomposition  for  the  matrices  (a) 


1  -1 
1  3 


(h) 


1  -2 

2  1 


(c) 


8 

6 


9 

■7 


(d) 


1 

2 


5 

1 


(e) 


2  -1  2  \ 

(  0  2  -1\ 

-2  3  -1  ,  (f) 

-1-1  1  . 

\ —6  6-5 / 

1 — 1 

O 

O 

8.6.2.  Show  that  a  real  unitary  matrix  is  an  orthogonal  matrix. 


0  8.6.3.  Prove  Proposition  8.44. 

7  8.6.4.  Write  out  a  new  proof  of  the  Spectral  Theorem  8.38  based  on  the  Schur  Decomposition. 


T  8.6.5.  A  complex  matrix  A  is  called  normal  if  it  commutes  with  its  Hermitian  transpose: 

VU  =  AAf.  (  a)  Show  that  every  real  symmetric  matrix  is  normal,  (b)  Show  that  every 
unitary  matrix  is  normal,  (c)  Show  that  every  real  orthogonal  matrix  is  normal,  (d)  Show 
that  an  upper  triangular  matrix  is  normal  if  and  only  if  it  is  diagonal,  (e)  Show  that  the 
eigenvectors  of  a  normal  matrix  form  an  orthogonal  basis  of  Cn  under  the  Hermitian  dot 
product,  (f)  Show  that  the  converse  is  true:  a  matrix  has  an  orthogonal  eigenvector  basis 
of  Cn  if  and  only  if  it  is  normal.  Hint :  Use  the  Schur  Decomposition,  (g)  How  can  you  tell 
when  a  real  matrix  has  a  real  orthonormal  eigenvector  basis? 


^  Since  all  matrices  are  real  in  this  example,  the  Hermitian  transpose  ^  reduces  to  the  ordinary 
T 

transpose 


8.6  Incomplete  Matrices 

The  Jordan  Canonical  Form 
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We  now  turn  to  the  more  sophisticated  Jordan  canonical  form.  Throughout  this  section,  A 
will  be  an  n  x  n  matrix,  with  either  real  or  complex  entries.  We  let  A1? . . . ,  Xk  denote  the 
distinct  eigenvalues  of  A ,  so  A^  ^  A  •  for  i  ^  j.  We  recall  that  Theorem  8.11  guarantees  that 
every  matrix  has  at  least  one  (complex)  eigenvalue,  so  k  >  1.  Moreover,  we  are  assuming 
that  k  <  n,  since  otherwise  A  would  be  complete. 

Definition  8.47.  Let  A  be  a  square  matrix.  A  Jordan  chain  of  length  j  for  A  is  a  sequence 
of  non-zero  vectors  w1? . . . ,  w  •  E  Cn  that  satisfies 

Awj  =  Awj,  Awi  =  Aw,  +  i  =  2,...,j,  (8.46) 

where  A  is  an  eigenvalue  of  A.  A  Jordan  chain  associated  with  a  zero  eigenvalue,  which 
requires  that  A  be  singular,  is  called  a  null  Jordan  chain ,  and  satisfies 

Awl  —  0,  Aw^Wj.j,  i  —  2, . . .  ,j.  (8.47) 


Note  that  the  initial  vector  wq  in  a  Jordan  chain  is  a  genuine  eigenvector,  and  so 
Jordan  chains  exist  only  when  A  is  an  eigenvalue.  The  rest,  w2, . . . ,  w .,  are  generalized 
eigenvectors ,  in  accordance  with  the  following  definition. 

Definition  8.48.  A  nonzero  vector  w^O  such  that 

(Al  —  A  I  )fe  w  =  0  (8.48) 

for  some  k  >  0  and  A  £  C  is  called  a  generalized  eigenvector '  of  the  matrix  A. 


Note  that  every  ordinary  eigenvector  is  automatically  a  generalized  eigenvector,  since 
we  can  just  take  k  =  1  in  (8.48);  but  the  converse  is  not  necessarily  valid.  We  shall  call  the 
minimal  value  of  k  for  which  (8.48)  holds  the  index  of  the  generalized  eigenvector.  Thus, 
an  ordinary  eigenvector  is  a  generalized  eigenvector  of  index  1.  Since  A  —  A  I  is  nonsingular 
whenever  A  is  not  an  eigenvalue  of  A,  its  kth  power  (A— A  I  )k  is  also  nonsingular.  Therefore, 
generalized  eigenvectors  can  exist  only  when  A  is  an  ordinary  eigenvalue  of  A  —  there  are 
no  additional  “generalized  eigenvalues” . 


Lemma  8.49.  The  ith  vector  wq  in  a  Jordan  chain  (8.46)  is  a  generalized  eigenvector  of 
index  i. 


Proof :  By  definition,  (A  —  AI)w1  =  0,  and  so  wq  is  an  eigenvector.  Next,  we  have 
(A  —  A  I )  w2  =  wq  jt:  0,  while  (A  —  A  I  )2  w2  =  (A  —  A  I )  wq  =0.  Thus,  w2  is  a  generalized 
eigenvector  of  index  2.  A  simple  induction  proves  that  (A  —  A  lV_1wq  =  w-.  A  0  while 
(A  —  A  I)* wq  =  0.  Q.E.D. 


2  1  0 

Example  8.50.  Consider  the  3x3  matrix  A  =  (  0  2  1  |  .  The  only  eigenvalue  is 

x0  0  2 

0  10 

A  =  2,  and  A  —  2  I  =  [  0  0  1|.  We  claim  that  the  standard  basis  vectors  e-^e^eg 

0  0  0 


t 


Despite  the  common  terminology,  this  is  not  the  same  concept  as  developed  in  Exercise  8.5.8. 
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form  a  Jordan  chain.  Indeed,  Ae1  —  2e1?  and  hence  e:  E  ker(A  —  2  1)  is  a  genuine 
eigenvector.  Furthermore,  ie2  =  2e2  +  e1?  and  ie3  =  2e3  +  e2,  as  you  can  easily  check. 
Thus,  e1,e2,e3  satisfy  the  Jordan  chain  equations  (8.46)  for  the  eigenvalue  A  =  2.  Note 

(0  0  1\ 

that  e2  lies  in  the  kernel  of  (A  —  2  I)2  =  0  0  0  ,  and  so  is  a  generalized  eigenvector 

\0  0  0/ 

of  index  2.  Indeed,  every  vector  of  the  form  w  =  ae1  +  be2  with  b  ^  0  is  a  generalized 
eigenvector  of  index  2.  (When  b  —  0,  a  ^  0,  the  vector  w  =  ae1  is  an  ordinary  eigenvector 
of  index  1.)  Finally,  (A  —  2  I)3  =  O,  and  so  every  nonzero  vector  0  ^  v  G  M3,  including 
e3,  is  a  generalized  eigenvector  of  index  3  or  less. 


A  basis  of  Mn  or  Cn  is  called  a  Jordan  basis  for  the  matrix  A  if  it  consists  of  one  or  more 
Jordan  chains  that  have  no  elements  in  common.  Thus,  for  the  matrix  in  Example  8.50, 
the  standard  basis  e1?  e2,  e3  is,  in  fact,  a  Jordan  basis.  An  eigenvector  basis,  if  such  exists, 
qualifies  as  a  Jordan  basis,  since  each  eigenvector  belongs  to  a  Jordan  chain  of  length  1. 
Jordan  bases  are  the  desired  extension  of  eigenvector  bases,  and  every  square  matrix  has 
one.  The  proof  of  the  following  Jordan  Basis  Theorem  will  appear  below. 


Theorem  8.51.  Every  n  x  n  matrix  admits  a  Jordan  basis  of  Cn.  The  first  elements  of 
the  Jordan  chains  form  a  maximal  set  of  linearly  independent  eigenvectors.  Moreover,  the 
number  of  generalized  eigenvectors  in  the  Jordan  basis  that  belong  to  the  Jordan  chains 
associated  with  the  eigenvalue  A  is  the  same  as  the  eigenvalue’s  multiplicity. 


/-1 

0 
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0 

°\ 

-2 

2 

-4 

1 

1 

Example  8.52.  Consider  the  5x5  matrix  A  = 

-1 

0 

-3 

0 

0 

characteristic  equation  is  found  to  be 

-4 

-1 

3 

1 

0 

Its 


pA( A)  =  det (A  -  A  I)  =  A5  +  A4  -  5 A3  -  A2  +  8 A  -  4  =  (A  -  l)3  (A  +  2)2  =  0, 

and  hence  A  has  two  eigenvalues:  Ax  =  1,  which  is  a  triple  eigenvalue,  and  A2  =  —2,  which 
is  double.  Solving  the  associated  homogeneous  systems  ( A  —  A  •  I)v  =  0,  we  find  that,  up 

to  constant  multiple,  there  are  only  two  eigenvectors:  v:  =  ( 0,  0,  0,  — 1, 1  )T  for  A:  =  1 

and,  anticipating  our  final  numbering,  v4  =  (—1,1,1,— 2, 0)  for  A2  =  —2.  Thus,  A  is  far 
from  complete. 

To  construct  a  Jordan  basis,  we  first  note  that  since  A  has  2  linearly  independent 
eigenvectors,  the  Jordan  basis  will  contain  two  Jordan  chains:  the  one  associated  with  the 
triple  eigenvalue  X1  —  1  will  have  length  3,  while  A2  =  —  2  leads  to  a  Jordan  chain  of 
length  2.  To  construct  the  former,  we  need  to  first  solve  the  system  (A  —  I)w  =  v:.  Note 
that  the  coefficient  matrix  is  singular  —  it  must  be,  since  1  is  an  eigenvalue  —  and  the 
general  solution  is  w  =  v2+tv1,  where  v2  =  ( 0, 1,  0,  0,  —  1 )  ,  and  t  is  the  free  variable.  The 
appearance  of  an  arbitrary  multiple  of  the  eigenvector  v:  in  the  solution  is  not  unexpected; 
indeed,  the  kernel  of  A  —  I  is  the  eigenspace  for  A4  =  1.  We  can  choose  any  solution,  e.g.,  v2 
as  the  second  element  in  the  Jordan  chain.  We  then  solve  (A  —  I  )w  =  v2  for  w  =  v3  +  ivl> 
where  v3  =  ( 0,  0,  0, 1,  0  )  can  be  used  as  the  final  element  of  this  Jordan  chain.  Similarly, 
to  construct  the  Jordan  chain  for  the  second  eigenvalue,  we  solve  ( A  +  2I)w  =  v4  for 

w  =  v5  +  tv4,  where  v5  =  (  —1,  0,  0,  —2, 1  )T  . 
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Thus,  the  desired  Jordan  basis  is 


0 


0 


0 


\ 


-1 


0 

1 

0 

1 

0 

V1  = 

0 

»  V2  = 

0 

< 

CO 

0 

.  V4  = 

1 

-  V5  = 

0 

-1 

0 

1 

-2 

-2 

\- 


-1 


\ 


V 


with  Avj  =  vl5  Av2  =  Vj  +  Vj,  Av3=v3  +  v2,  Av4  =  —  2v4,  iv5  =  - 2v5+vz 


Just  as  an  eigenvector  basis  diagonalizes  a  complete  matrix,  a  Jordan  basis  provides  a 
particularly  simple  form  for  an  incomplete  matrix,  known  as  the  Jordan  canonical  form. 


Definition  8.53.  An  n  x  n  matrix  of  the  forird 


1 

A  1 
A  1 


\ 


\ 


in  which  A  is  a  real  or  complex  number,  is  known  as  a  Jordan  block. 


(8.49) 


In  particular,  a  1  x  1  Jordan  block  is  merely  a  scalar  Jx  i  =  A.  Since  every  matrix  has 
at  least  one  (complex)  eigenvector  —  see  Theorem  8.11  —  the  Jordan  block  matrices  have 
the  least  possible  number  of  independent  eigenvectors.  The  following  result  is  immediate. 

Lemma  8.54.  The  nxn  Jordan  block  matrix  JA  n  has  a  single  eigenvalue,  A,  and  a  single 
independent  eigenvector,  er  The  standard  basis  vectors  e1? . . . ,  en  form  a  Jordan  chain 

for  J\,n- 


Definition  8.55.  A  Jordan  matrix  is  a  square  matrix  of  block  diagonal  form 


J  diag  {J\ltHl ,  J \.2  n2 •  •  •  • ,  J\k!nk) 


J 


\2,n2 


\ 


(8.50) 


J 


^ k  I'Uk 


in  which  one  or  more  Jordan  blocks,  not  necessarily  of  the  same  size,  he  along  the  diagonal, 
while  the  blank  off-diagonal  blocks  are  all  zero. 


Note  that  the  only  possibly  non-zero  entries  in  a  Jordan  matrix  are  those  on  the  diagonal, 
which  can  have  any  complex  value,  including  0,  and  those  on  the  superdiagonal,  which  are 
either  1  or  0.  The  positions  of  the  superdiagonal  l’s  uniquely  prescribes  the  Jordan  blocks. 


All  non-displayed  entries  are  zero. 
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For  example,  the  6x6  matrices 
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2/ 

are  all  Jordan  matrices:  the  first  is  a  diagonal  matrix,  consisting  of  6  distinct  lxl  Jordan 
blocks;  the  second  has  a  4  x  4  Jordan  block  followed  by  a  2  x  2  block  that  happen  to  have 
the  same  diagonal  entries;  the  last  has  three  2x2  Jordan  blocks  with  respective  diagonal 
entries  0, 1,  2. 

As  a  direct  corollary  of  Lemma  8.54  combined  with  the  matrix’s  block  structure,  cf.  Ex¬ 
ercise  8.2.50,  we  obtain  a  complete  classification  of  the  eigenvectors  and  eigenvalues  of  a 
Jordan  matrix. 

Lemma  8.56.  The  Jordan  matrix  (8.50)  has  eigenvalues  A1? . . . ,  Xk.  The  standard  basis 
vectors  e1,...,en  form  a  Jordan  basis,  which  is  partitioned  into  nonoverlapping  Jordan 
chains  labeled  by  the  Jordan  blocks. 


Thus,  in  the  preceding  examples  of  Jordan  matrices,  the  first  has  three  double  eigenval¬ 
ues,  1  and  2  and  3,  and  corresponding  linearly  independent  eigenvectors  e^eg,  and  e2,e5, 
and  e3,  e4,  each  of  which  belongs  to  a  Jordan  chain  of  length  1.  The  second  matrix  has  only 
one  eigenvalue,  —  1,  but  two  independent  eigenvectors  e1?  e5,  and  hence  two  Jordan  chains, 
namely  e1,e2,e3,e4,  and  e5,e6.  The  last  has  three  eigenvalues  0,1,2,  three  eigenvectors 
e1?  e3,  e5,  and  three  Jordan  chains  of  length  2:  e1?  e2,  and  e3,  e4,  and  e5,  e6.  In  particular, 
the  only  complete  Jordan  matrices  are  the  diagonal  matrices,  all  of  whose  Jordan  blocks 
are  of  size  lxl. 

The  Jordan  canonical  form  follows  from  the  Jordan  Basis  Theorem  8.51. 

Theorem  8.57.  Let  A  be  an  n  x  n  real  or  complex  matrix.  Let  S  =  ( wy  w2  . . .  wn  ) 
be  a  matrix  whose  columns  form  a  Jordan  basis  of  A.  Then  S  places  A  into  the  Jordan 
canonical  form 

S~1AS  =  J  =  diag(JAini,  JA2i„2,  •  •  •  ,J\k,nk),  or,  equivalently,  A  =  SJS~1.  (8.51) 

The  diagonal  entries  of  the  similar  Jordan  matrix  J  are  the  eigenvalues  of  A.  In  partic¬ 
ular,  A  is  complete  (diagonalizable)  if  and  only  if  every  Jordan  block  is  of  size  1  x  1  or, 
equivalently,  all  Jordan  chains  are  of  length  1.  The  Jordan  canonical  form  of  A  is  uniquely 
determined  up  to  a  permutation  of  the  diagonal  Jordan  blocks. 
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has  the  following  Jordan  basis  matrix  and  Jordan  canonical  form 
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Finally,  let  us  outline  a  proof  of  the  Jordan  Basis  Theorem  8.51. 

Lemma  8.58.  If  v1? . . . ,  vn  forms  a  Jordan  basis  for  the  matrix  A,  it  also  forms  a  Jordan 
basis  for  B  =  A  —  c  I ,  for  any  scalar  c. 

Proof :  We  note  that  the  eigenvalues  of  B  are  all  of  the  form  A  —  c,  where  A  is  an  eigenvalue 
of  A.  Moreover,  given  a  Jordan  chain  w1? . . . ,  w  •  of  A,  we  have 

B  w1  =  (A-c)w1,  B\Vi  =  (X-c)wl  +  wi_1,  i  =  2,...,j, 

so  w1? . . . ,  w  ■  is  also  a  Jordan  chain  for  B  corresponding  to  the  eigenvalue  A  —  c.  Q.E.D. 

The  proof  of  Theorem  8.51  will  be  done  by  induction  on  the  size  n  of  the  matrix.  The 
case  n  —  1  is  trivial,  since  every  nonzero  element  of  C  is  a  Jordan  basis  for  a  1  x  1  matrix. 
To  perform  the  induction  step,  we  assume  that  the  result  is  valid  for  all  matrices  of  size 
<n  —  1.  Let  A  be  an  n  x  n  matrix.  According  to  Theorem  8.11,  A  has  at  least  one  complex 
eigenvalue  X1.  Let  B  =  A  —  X1I.  Since  A:  is  an  eigenvalue  of  A ,  we  know  that  0  is  an 
eigenvalue  of  B.  This  means  that  kerL>  {0},  and  so  r  —  rank B  <  n.  Moreover,  by 
Lemma  8.58,  a  Jordan  basis  of  B  is  also  a  Jordan  basis  for  A,  and  so  we  can  concentrate 
all  our  attention  on  the  singular  matrix  B  from  now  on. 

Recall  that  W  =  img B  C  Cn  is  an  invariant  subspace  for  L>,  i.e.,  Bw  E  W  whenever 
w  E  W.  Moreover,  since  B  is  singular,  dim  IF  —  r  —  rank  B  <  n.  Thus,  by  fixing  a  basis 
of  W,  we  can  realize  the  restriction  B:W  — >>  W  as  multiplication  by  an  r  x  r  matrix.  The 
fact  that  r  <  n  allows  us  to  invoke  the  induction  hypothesis  and  deduce  the  existence  of  a 
Jordan  basis  w1? . . . ,  wr  E  W  C  Cn  for  the  action  of  B  on  the  subspace  W.  Our  goal  is  to 
complete  this  collection  to  a  full  Jordan  basis  on  Cn. 

To  this  end,  we  append  two  additional  kinds  of  vectors.  Suppose  that  the  Jordan 
basis  of  W  contains  k  null  Jordan  chains  associated  with  its  zero  eigenvalue.  Each  null 
Jordan  chain  consists  of  vectors  w1? . . . ,  w-  E  W  satisfying  B  wq  =0,  B  w2  =  w1?  . . . , 
Bwj  =  w-_1?  cf.  (8.47).  The  number  k  of  null  Jordan  chains  is  equal  to  the  number 

of  linearly  independent  null  eigenvectors  of  B  that  belong  to  W  =  img  B,  that  is  k  — 
dim(ker  B  D  img  B).  To  each  such  null  Jordan  chain,  we  append  a  vector  w-+1  E  Cn 

such  that  5wJ+1  =  w-,  noting  that  w-+1  exists  because  w-  E  img B.  We  deduce  that 

w1? . . . ,  w  -+1  E  Cn  forms  a  null  Jordan  chain,  of  length  j  +  1.  Having  extended  all  the 
null  Jordan  chains  in  W,  the  resulting  collection  consists  of  r  +  k  vectors  in  Cn  arranged 
in  nonoverlapping  Jordan  chains.  To  complete  to  a  basis,  we  append  n  —  r  —  k  additional 
linearly  independent  null  vectors  z1? . . . ,  zn_r_k  E  ker  B  \  img  B  that  he  outside  its  image. 
Since  B  z  •  =  0,  each  z  •  forms  a  null  Jordan  chain  of  length  1.  We  claim  that  the  complete 
collection  consisting  of  the  r  non-null  Jordan  chains  in  W,  the  k  extended  null  chains,  and 
the  additional  null  vectors  z1? . . . ,  zn_r_fc,  forms  the  desired  Jordan  basis.  By  construction, 
it  consists  of  nonoverlapping  Jordan  chains.  The  only  remaining  issue  is  the  proof  that 
the  resulting  collection  of  vectors  is  linearly  independent;  this  is  left  as  a  challenge  for  the 
reader  in  Exercise  8.6.25.  Q.E.D. 
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With  the  Jordan  canonical  form  in  hand,  the  general  result  characterizing  complex 
invariant  subspaces  of  a  general  square  matrix  can  now  be  stated. 

Theorem  8.59.  Every  complex  invariant  subspace  of  a  square  matrix  A  is  spanned  by  a 
finite  number  of  Jordan  chains. 

Proof :  The  proof  proceeds  in  the  same  manner  as  that  of  Theorem  8.30.  We  assume  that 
vfc  is  the  last  vector  in  its  Jordan  chain.  We  deduce  that  A  w  — Afcw  is  a  linear  combination 
of  v1? . . . ,  vfc-1  with  non- vanishing  coefficients,  and  so  the  induction  step  remains  valid  in 
this  case.  The  details  are  left  to  the  reader.  Q.E.D. 

Similarly,  a  complex  conjugate  pair  of  Jordan  chains  of  order  k  produces  two  distinct  k- 
dimensional  complex  invariant  subspaces,  and  hence  a  real  invariant  subspace  of  dimension 
2  k  obtained  from  their  real  and  imaginary  parts.  A  detailed  statement  of  the  nature  of 
the  most  general  real  invariant  subspace  of  a  matrix  is  also  left  to  the  reader. 


Exercises 


8.6.6.  For  each  of  the  following  Jordan  matrices,  identify  the  Jordan  blocks.  Write  down  the 
eigenvalues,  the  eigenvectors,  and  the  Jordan  basis.  Clearly  identify  the  Jordan  chains. 
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8.6.7.  Find  a  Jordan  basis  and  Jordan  canonical  form  for  each  of  the  following  matrices: 
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8.6.8.  Write  down  all  possible  4x4  Jordan  matrices  that  have  only  A  =  2  as  an  eigenvalue. 

8.6.9.  Write  down  all  3  x  3  Jordan  matrices  that  have  eigenvalues  2  and  5  (and  no  others). 


8.6.10.  Write  down  a  formula  for  the  inverse  of  a  Jordan  block  matrix. 

Hint :  Try  some  small  examples  first  to  help  in  figuring  out  the  pattern. 

8.6.11.  True  or  false:  If  A  is  complete,  every  generalized  eigenvector  is  an  ordinary  eigenvector. 

8.6.12.  True  or  false:  Every  generalized  eigenvector  belongs  to  a  Jordan  chain. 

8.6.13.  True  or  false:  If  w  is  a  generalized  eigenvector  of  A,  then  w  is  a  generalized  eigenvector 
of  every  power  AJ ,  for  j  £  N,  thereof. 


8.6.14.  True  or  false:  If  w  is  a  generalized  eigenvector  of  index  k  of  A,  then  w  is  an  ordinary 
eigenvector  of  Ak . 

0  8.6.15.  Suppose  you  know  all  eigenvalues  of  a  matrix  as  well  as  their  algebraic  and  geometric 
multiplicities.  Can  you  determine  the  matrix’s  Jordan  canonical  form? 

8.6.16.  True  or  false:  If  w1? . . . ,  w  -  is  a  Jordan  chain  for  a  matrix  A,  so  are  the  scalar 
multiples  cw1? . . . ,  cwj  for  all  c  /  0. 

8.6.17.  Let  A  and  B  be  n  x  n  matrices.  According  to  Exercise  8.2.23,  the  matrix  products  A B 
and  BA  have  the  same  eigenvalues.  Do  they  have  the  same  Jordan  form? 


8.6  Incomplete  Matrices 
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8.6.18.  True  or  false:  If  the  Jordan  canonical  form  of  A  is  J,  then  that  of  A 2  is  J2. 

r\ 

0  8.6.19.  (a)  Give  an  example  of  a  matrix  A  such  that  A  has  an  eigenvector  that  is  not  an 

r\ 

eigenvector  of  A.  ( b )  Show  that,  in  general,  every  eigenvalue  of  A  is  the  square  of  an 
eigenvalue  of  A. 


0  8.6.20.  (a)  Prove  that  a  Jordan  block  matrix  J0  with  zero  diagonal  entries  is  nilpotent, 

as  in  Exercise  1.3.12.  (b)  Prove  that  a  Jordan  matrix  is  nilpotent  if  and  only  if  all  its 
diagonal  entries  are  zero,  (c)  Prove  that  a  matrix  is  nilpotent  if  and  only  if  its  Jordan 
canonical  form  is  nilpotent.  (d)  Explain  why  a  matrix  is  nilpotent  if  and  only  if 
its  only  eigenvalue  is  0. 

8.6.21.  Let  J  be  a  Jordan  matrix,  (a)  Prove  that  Jk  is  a  complete  matrix  for  some  k  >  1  if 
and  only  if  either  J  is  diagonal,  or  J  is  nilpotent  with  Jk  =  O.  (b)  Suppose  that  A  is  an 

incomplete  matrix  such  that  Ak  is  complete  for  some  k  >  2.  Prove  that  Ak  =  O,  and  hence 
A  is  nilpotent.  (A  simpler  version  of  this  problem  appears  in  Exercise  8.3.8.) 

O  8.6.22.  Cayley-Hamilton  Theorem :  Let  pA( A)  =  det(A  —  A  I)  be  the  characteristic  polynomial 
of  A.  (a)  Prove  that  if  D  is  a  diagonal  matrix,  then^  pD(D)  =  O. 

Hint :  Leave  pD( A)  in  factored  form,  (b)  Prove  that  if  A  is  complete,  then  pA(A)  =  O. 

(c)  Prove  that  if  J  is  a  Jordan  block,  then  Pj(J)  =  O.  (d)  Prove  that  this  also  holds  if  J  is 
a  Jordan  matrix,  (e)  Prove  the  Cayley-Hamilton  Theorem:  a  square  matrix  A  satisfies  its 
own  characteristic  equation:  pA(A)  =  O. 


O  8.6.23.  Minimal  polynomial :  Let  A  be  an  n  x  n  matrix.  By  definition,  the  minimal  polynomial 
of  A  is  the  monic  polynomial  mA(t)  =  tk  +  ck_1tk~ 1  +  •  •  •  +  c^t  +  c0  of  minimal 

degree  k  that  annihilates  A,  so  mA(A)  =  Ak  +  ck_1Ak~ 1  +  •  •  •  +  c1A  +  c0  I  =0. 

(a)  Prove  that  the  monic  minimal  polynomial  mA  is  unique,  (b)  (c)  Prove  that  if  r(t)  is 
any  other  polynomial  such  that  r(A)  =  O,  then  r(t)  =  q(t)mA(t)  for  some  polynomial  q(t). 

(d)  Prove  that  the  matrix’s  minimal  polynomial  is  a  factor  of  its  characteristic  polynomial, 
so  pA(t)  =  qA(t)mA(t)  for  some  polynomial  qA(t).  Hint:  Use  the  Cayley-Hamilton 

Theorem  in  Exercise  8.6.22.  (e)  Prove  that  if  A  has  all  distinct  eigenvalues,  then  pA  =  mA. 
(f)  Prove  that  pA  =  mA  if  and  only  if  no  two  Jordan  blocks  have  the  same  eigenvalue. 


0  8.6.24.  Prove  Lemma  8.54. 

0  8.6.25.  Prove  that  the  n  vectors  constructed  in  the  proof  of  Theorem  8.51  are  linearly 
independent  and  hence  form  a  Jordan  basis. 

Hint:  Suppose  that  some  linear  combination  vanishes.  Apply  B  to  the  equation,  and  then 
use  the  fact  that  we  started  with  a  Jordan  basis  for  W  =  imgb>. 


8.6.26.  Find  all  invariant  subspaces  of  the  following  matrices:  (a) 


>  (b) 


(0 

-1 

(  3 

0 

1\ 

(c) 

1 

1 

-1  > 

(d) 

0 

1 

o  ,  (e) 

V3 

3 

-4/ 

\o 

0 

3/ 

1 

4 


(1 

0 

1 

°\ 

(-1 

0 

1 

2  \ 

0 

1 

0 

0 

,  (f) 

0 

1 

0 

1 

0 

0 

1 

0 

-1 

-4 

1 

-2 

\0 

1 

0 

1 J 

^  0 

1 

0 

l) 

T  8.6.27.  The  method  for  constructing  a  Jordan  basis  in  Example  8.52  is  simplified  due  to  the 
fact  that  each  eigenvalue  admits  only  one  Jordan  block.  On  the  other  hand,  the  method 
used  in  the  proof  of  Theorem  8.51  is  rather  impractical  Devise  a  general  method  for 
constructing  a  Jordan  basis  for  an  arbitrary  matrix,  paying  careful  attention  to  how  the 
Jordan  chains  having  the  same  eigenvalue  are  found. 


See  Exercise  1.2.35  for  the  basics  of  matrix  polynomials. 
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8.7  Singular  Values 

We  have  already  expounded  on  the  central  role  played  by  the  eigenvalues  and  eigenvectors 
of  a  square  matrix  in  both  theory  and  applications.  Further  evidence  will  appear  in  the 
subsequent  chapters.  Alas,  rectangular  matrices  do  not  have  eigenvalues  (why?),  and  so, 
at  first  glance,  do  not  appear  to  possess  any  quantities  of  comparable  significance.  But 
you  no  doubt  recall  that  our  earlier  treatment  of  least  squares  minimization  problems, 
as  well  as  the  equilibrium  equations  for  structures  and  circuits,  made  essential  use  of  the 
symmetric,  positive  semi-definite  square  Gram  matrix  K  —  ATA  —  which  can  be  naturally 
formed  even  when  A  is  not  square.  Perhaps  the  eigenvalues  of  K  might  play  a  comparably 
important  role  for  general  matrices.  Since  they  are  not  easily  related  to  the  eigenvalues  of 
A  —  which,  in  the  non-square  case,  don’t  even  exist  —  we  shall  endow  them  with  a  new 
name.  They  were  first  studied  by  the  German  mathematician  Erhard  Schmidt  in  early 
days  of  the  twentieth  century,  although  intimations  can  be  found  in  Gauss’s  work  on  rigid 
body  dynamics. 

Definition  8.60.  The  singular  values  aq, . . .  ,crr  of  an  m  x  n  matrix  A  are  the  positive 
square  roots,  ai  =  ^J\'i  >  0,  of  the  nonzero  eigenvalues  of  the  associated  Gram  matrix 
K  —  ATA.  The  corresponding  eigenvectors  of  K  are  known  as  the  singular  vectors  of  A. 


Since  K  is  necessarily  positive  semi-definite,  its  eigenvalues  are  always  non- negative, 
A^  >  0,  which  justifies  the  positivity  of  the  singular  values  of  A  —  independently  of 
whether  A  itself  has  positive,  negative,  or  even  complex  eigenvalues,  or  is  rectangular  and 
has  no  eigenvalues  at  all.  We  will  follow  the  standard  convention,  and  always  label  the 
singular  values  in  decreasing  order,  so  that  oq  >  a2  >  •  •  •  >  ar  >  0.  Thus,  a1  will  always 
denote  the  largest,  or  dominant ,  singular  value.  If  K  =  ATA  has  repeated  eigenvalues,  the 
singular  values  of  A  are  repeated  with  the  same  multiplicities.  As  we  will  see,  the  number 
r  of  singular  values  is  equal  to  the  rank  of  the  matrices  A  and  K. 

Warning.  Some  texts  include  the  zero  eigenvalues  of  K  as  singular  values  of  A.  We  find 
this  to  be  less  convenient  for  our  development,  but  you  should  be  aware  of  the  differences 
between  the  two  conventions. 


Example  8.61.  Let  A  = 


The  associated  Gram  matrix 


K  =  Ar.i=(l 


f 25  15  \ 

y 15  25  J 


has  eigenvalues  Ax  =  40,  A2  =  10,  and  corresponding  eigenvectors  vx 


Thus,  the  singular  values  of  A  are  a1  =  v40  ~  6.3246  and  a2  =  a/10  ~  3.1623,  with  v1?  v2 
being  the  singular  vectors.  Note  that  the  singular  values  are  not  its  eigenvalues,  which  are 

Ax  =  \  (3  +  V89)  -  6.2170  and  A2  =  \  (3  -  V89)  -  -  3.2170,  nor  are  the  singular  vectors 
eigenvectors  of  A. 


Only  in  the  special  case  of  symmetric  matrices  is  there  a  direct  connection  between  their 
singular  values  and  their  (necessarily  real)  eigenvalues. 


Proposition  8.62.  If  A  —  AT  is  a  symmetric  matrix,  its  singular  values  are  the  absolute 
values  of  its  nonzero  eigenvalues:  ai  =  |  A^  |  >  0;  its  singular  vectors  coincide  with  its 
non-null  eigenvectors. 


8.7  Singular  Values 

Proof :  When  A  is  symmetric,  K  —  ATA  —  A2.  So,  if 
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4v  =  Av,  then  Kv  =  A2v  =  A(Av)  =  A  4v  =  A2v. 

Thus,  every  eigenvector  v  of  A  is  also  an  eigenvector  of  K  with  eigenvalue  A2.  Therefore, 
the  eigenvector  basis  of  A  guaranteed  by  Theorem  8.32  is  also  an  eigenvector  basis  for  K , 
and  hence  forms  a  complete  system  of  singular  vectors  for  A.  Q.E.D. 

The  generalization  of  the  spectral  factorization  (8.35)  to  non-symmetric  matrices  is 
known  as  the  singular  value  decomposition ,  commonly  abbreviated  SVD.  Unlike  the  former, 
which  applies  only  to  square  matrices,  every  nonzero  matrix  possesses  a  singular  value 
decomposition. 

Theorem  8.63.  A  nonzero  real  m  x  n  matrix  A  of  rank  r  >  0  can  be  factored, 

A  =  (8.52) 

into  the  product  of  an  m  x  r  matrix  P  with  orthonormal^  columns,  so  PT P  —  I ,  the  r  x  r 
diagonal  matrix  E  =  diag  (oy, . . . ,  ar)  that  has  the  singular  values  of  A  as  its  diagonal 

entries,  and  an  r  x  n  matrix  QT  with  orthonormal  rows,  so  QTQ  —  I . 

Proof :  Let’s  begin  by  rewriting  the  desired  factorization  (8.52)  as  AQ  =  PE.  The  indi¬ 
vidual  columns  of  this  matrix  equation  are  the  vector  equations 

^qi  =  <7iPi,  z  =  l,...,r,  (8.53) 

relating  the  orthonormal  columns  of  Q  —  ( q:  q2  . . .  qr )  to  the  orthonormal  columns  of 
P  —  ( Pi  p2  •  •  •  pr  )•  Thus,  our  goal  is  to  find  vectors  p1? . . . ,  pr  and  q1? . . . ,  qr  that  satisfy 
(8.53).  To  this  end,  we  let  q1,...,qr  be  orthonormal  eigenvectors  of  the  Gram  matrix 
K  —  ATA  corresponding  to  the  non-zero  eigenvalues,  which,  according  to  Proposition  8.37 
form  a  basis  for  img  K  —  coimgA,  of  dimension  r  —  rank  AT  =  rank  A.  Thus,  by  the 
definition  of  the  singular  values, 

ATAqi  =  Kq{  =  afq{,  i  =  l,...,r.  (8.54) 


We  claim  that  the  image  vectors  wi  =  Ae\{  are  automatically  orthogonal.  Indeed,  in 
view  of  the  orthonormality  of  the  q^  combined  with  (8.54), 


W i  •  w  -  =  wfw,  =  (Aqi)1Aqi  =  q'  A'  Aq  .  =  X  q-  q  -  =  X  q^  •  q7-  = 


,T 


T  aT 


.2  T 


o,  i  A  j, 


J 


J  y—'T.u  ~LI  -  -  -  J  -Ll  -  J  1  _2 


1  =  J' 


Consequently,  w: , . . . ,  wr  form  an  orthogonal  system  of  vectors  having  respective  norms 


w2 


=  V7  =  ai 


We  conclude  that  the  associated  unit  vectors 


P.  = 


w„ 


(7  a 


Aq, 


form  an  orthonormal  set  of  vectors  satisfying  the  required  equations  (8.53) 


(8.55) 


Q.E.D. 


t 


Throughout  this  section,  we  exclusively  use  the  Euclidean  dot  product  and  norm. 
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Remark.  If  A  has  distinct  singular  values,  its  singular  value  decomposition  (8.52)  is  almost 
unique,  modulo  simultaneously  changing  the  signs  of  one  or  more  corresponding  columns  of 
Q  and  P.  Matrices  with  repeated  singular  values  have  more  freedom  in  their  singular  value 
decomposition,  since  one  can  choose  singular  vectors  using  different  orthonormal  bases  of 
each  eigenspace  of  the  Gram  matrix  AT A. 

Observe  that,  taking  the  transpose  of  (8.52)  and  noting  that  ET  =  E  is  diagonal,  we 
obtain 

At  =  Q  S  P,  (8.56) 

which  is  a  singular  value  decomposition  of  the  transposed  matrix  AT .  In  particular,  we 
obtain  the  following  result: 


Corollary  8.64.  A  matrix  A  and  its  transpose  AT  have  the  same  singular  values. 


Note  that  their  singular  vectors  are  not  the  same;  indeed,  those  of  A  are  the  orthonormal 
columns  of  Q,  whereas  those  of  AT  are  the  orthonormal  columns  of  P,  which  are  related 
by  (8.53).  Thus, 

^TPl  =  o'tqi,  i  =  l,...,r,  (8.57) 

which  is  also  a  consequence  of  (8.54). 

Furthermore,  the  singular  value  decomposition  (8.52)  serves  to  diagonalize  the  Gram 
matrix;  indeed,  since  PT P  —  I ,  we  have 

QtKQ  =  QtAtAQ  =  QtAtPtPAQ  =  ( PAQ)t(PAQ )  =  ETE  =  E2,  (8.58) 


because  E  is  diagonal.  If  A  has  maximal  rank  n,  then  Q  is  an  n  x  n  orthogonal  matrix, 
and  so  (8.58)  implies  that  the  linear  transformation  defined  by  the  Gram  matrix  K  is 
diagonalized  when  expressed  in  terms  of  the  orthonormal  basis  formed  by  the  singular 
vectors.  If  r  =  rank  A  <  n,  then  one  can  supplement  the  r  singular  vectors  iq,...,!^ 
with  n  —  r  unit  vectors  ur+1, . . . ,  un  E  ker  K  =  ker  A  so  as  to  form  an  orthonormal  basis 
of  Mn.  In  terms  of  this  basis,  the  Gram  matrix  assumes  diagonal  form  with  the  r  nonzero 
squared  singular  values  (nonzero  eigenvalues  of  K)  as  its  first  r  diagonal  entries,  while  the 
remaining  diagonal  entries  are  all  0,  in  accordance  with  the  Spectral  Theorem  8.38. 


3  5 

Example  8.65.  For  the  matrix  A  —  (  ^  )  in  Example  8.61,  the  orthonormal  eigen¬ 


vector  basis  of  K  —  ATA  =  [  ^  25  )  ^ven  the  unit  singular  vectors  Ti  —  (  ^ 

73 


and  q2  =  (  ^  j .  Thus,  Q  = 

72 


Pi  = 


Aqi 


a 


1  /  4x/2 

\/40  y  2  y/2 


_  I  72 
72 

2 

75 

1 

75 


^  ].  Next,  according  to  (8.55), 
72 


p2  = 


M h  =  f  V2 

Viol  —  2y/2 


1 

V5 

2 

V5 


and  thus  P 
torization 

A  = 


2 

V5 

1 

\/5 

3  5 

4  0 


1 

75 

2 

75 


You  may  wish  to  validate  the  resulting  singular  value  fac- 


2 

75 

1 

75 


1 

75 

2 

75 


740 


0 


0  710 


i 

72 

i 

72 


1 

72 

1 

72 


PHQt. 
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The  singular  value  decomposition  is  revealing  some  interesting  new  geometrical  infor¬ 
mation  concerning  the  action  of  matrices,  further  supplementing  the  discussion  begun  in 
Section  2.5  and  continued  in  Section  4.4. 

Proposition  8.66.  Given  the  singular  value  decomposition  A  =  PEQT,  the  columns 
q1? . . . ,  qr  of  Q  form  an  orthonormal  basis  for  coimg  A,  while  the  columns  p1? . . . ,  pr  of  P 
form  an  orthonormal  basis  for  img  A. 


Proof :  The  first  part  of  the  proposition  was  proved  during  the  course  of  the  proof  of 
Theorem  8.63.  Moreover,  the  vectors  p^  =  A(qLi/ai)  for  i  —  1, . . . ,  r  are  mutually  orthogo¬ 
nal,  of  unit  length,  and  belong  to  img  A,  which  has  dimension  r  —  rank  A.  They  therefore 
form  an  orthonormal  basis  for  the  image.  Q.E.D. 


If  A  is  a  nonsingular  n  x  n  matrix,  then  the  matrices  P,  V,  Q  appearing  in  its  singular 
value  decomposition  (8.52)  are  all  of  size  n  x  n.  If  we  interpret  them  as  linear  transfor¬ 
mations  of  Mn,  the  two  orthogonal  matrices  represent  rigid  rotations/reflections,  while  the 
diagonal  matrix  E  represents  a  combination  of  simple  stretches,  by  an  amount  given  by 
the  singular  values,  in  the  orthogonal  coordinate  directions.  Thus,  every  invertible  linear 
transformation  on  Mn  can  be  decomposed  into  a  rotation/reflection  QT ,  followed  by  the 
stretching  transformation  along  the  coordinate  axes  represented  by  E,  followed  by  another 
rotation/reflection  P.  See  also  Exercise  8.5.28. 

In  the  more  general  rectangular  case,  the  matrix  QT  represents  an  orthogonal  projection 
from  Mn  to  coimg  A,  the  matrix  E  continues  to  represent  a  stretching  transformation  within 
this  r-dimensional  subspace,  while  P  maps  the  result  to  img  A  C  Mm.  We  already  noted  in 
Section  4.4  that  the  linear  transformation  L :  Mn  defined  by  matrix  multiplication, 

L[x]  —  Ax,  can  be  interpreted  as  a  projection  from  Mn  to  coimg  A  followed  by  an  invertible 
map  from  coimg  A  to  img  A.  The  singular  value  decomposition  tells  us  that  not  only  is  the 
latter  map  invertible,  it  is  simply  a  combination  of  stretches  in  the  r  mutually  orthogonal 
singular  directions  q1? . . . ,  qr,  whose  magnitudes  equal  the  nonzero  singular  values.  In  this 
way,  we  have  at  last  reached  a  complete  understanding  of  the  subtle  geometry  underlying 
the  simple  operation  of  multiplying  a  vector  by  a  matrix! 

Finally,  we  note  that  practical  numerical  algorithms  for  computing  singular  values  and 
the  singular  value  decomposition  can  be  found  in  Chapter  9  and  in  [32,  66,  90]. 


The  Pseudoinverse 


The  singular  value  decomposition  enables  us  to  substantially  generalize  the  concept  of 
a  matrix  inverse.  The  pseudoinverse  was  first  defined  by  the  American  mathematician 
Eliakim  Moore  in  the  1920’s  and  rediscovered  by  the  British  mathematical  physicist  Sir 
Roger  Penrose  in  the  1950’s,  and  often  has  their  names  attached. 

Definition  8.67.  The  pseudoinverse  of  a  nonzero  m  x  n  matrix  with  singular  value  de¬ 
composition  A  —  P  E  QT  is  the  n  x  m  matrix  A+  =  Q  E-1PT. 


Note  that  the  latter  equation  is  the  singular  value  decomposition  of  the  pseudoinverse 
A+,  and  hence  its  nonzero  singular  values  are  the  reciprocals  of  the  nonzero  singular  values 
of  A.  The  only  matrix  without  a  pseudoinverse  is  the  zero  matrix  O.  If  A  is  a  non-singular 
square  matrix,  then  its  pseudoinverse  agrees  with  its  ordinary  inverse.  Indeed,  since  in  this 
case  both  P  and  Q  are  square  orthogonal  matrices,  it  follows  that 

A -1  =  (PHQt)-1  =  (<5'1)tS~1  p-1  =Q'E~1Pt  =  A+, 
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where  we  used  the  fact  that  the  inverse  of  an  orthogonal  matrix  is  equal  to  its  transpose. 
More  generally,  if  A  has  linearly  independent  columns,  or,  equivalently,  ker  A  =  {0},  then 
we  can  bypass  the  singular  value  decomposition  to  compute  its  pseudoinverse. 

Lemma  8.68.  Let  A  be  an  m  x  n  matrix  of  rank  n.  Then 


A+  =  {ATA)~1AT.  (8.59) 

Proof :  Replacing  A  by  its  singular  value  decomposition  (8.52),  we  find 

AtA  =  (P  E  Qt)t(P  E  Qt)  =  Q  E  PTP  EQT  =  Q  E2  Qt,  (8.60) 

since  E  =  ET  is  a  diagonal  matrix,  while  PT P  =  I ,  since  the  columns  of  P  are  orthonormal. 
This  is  merely  the  spectral  factorization  (8.35)  of  the  Gram  matrix  ATA  —  which  we  in 
fact  already  knew  from  the  original  definition  of  the  singular  values  and  vectors.  Now  if  A 
has  rank  n,  then  Q  is  an  n  x  n  orthogonal  matrix,  and  so  Q_1  =  QT .  Therefore, 

(AtA)~1At  =  (Q  E~2  Qt)-\P^Qt)t  =  (Q  E~2  QT)(Q  EPT)  =  Q  E_1PT  =  A+.  Q.E.D. 

If  A  is  square  and  nonsingular,  then,  as  we  know,  the  solution  to  the  linear  system 
Ax.  =  b  is  given  by  x*  =  A^1  b.  For  a  general  coefficient  matrix,  the  vector  x*  =  A+h 
obtained  by  applying  the  pseudoinverse  to  the  right-hand  side  plays  a  distinguished  role 
it  is,  in  fact,  the  least  squares  solution  to  the  system  under  the  Euclidean  norm. 


Theorem  8.69.  Consider  the  linear  system  4x  =  b.  Let  x*  =  A+b,  where  A +  is  the 
pseudoinverse  of  A.  If  ker  A  =  {0},  then  x*  is  the  (Euclidean)  least  squares  solution  to 
the  linear  system.  If,  more  generally,  ker  A  {0},  then  x*  =  A+h  E  coimgEl  is  the  least 
squares  solution  that  has  the  minimal  Euclidean  norm  among  all  vectors  that  minimize  the 
least  squares  error  Ax  —  b  2 


Proof :  To  show  that  x*  =  A+b  is  the  least  squares  solution  to  the  system,  we  must, 
according  to  Theorem  5.11  check  that  it  satisfies  the  normal  equations  ATAx*  —  AT b.  If 
rank  A  —  n,  so  ATA  is  nonsingular,  this  follows  immediately  from  (8.59).  More  generally, 
combining  (8.60),  the  definition  of  the  pseudoinverse,  and  the  fact  that  Q  has  orthonormal 
columns,  so  QTQ  =  I ,  yields 

AtAx*  =  AtAA+  b  =  (QE2  Qt)(QE“1Pt) b  =  QEPTb  =  AT b. 

This  proves  that  x*  solves  the  normal  equations,  and  hence,  by  Theorem  5.11  and  Exercise 
5.4.11,  minimizes  the  least  squares  error.  Moreover, 

xi  =  4+b  =  QS-1PTb 

where  c  = 

=  Qc  =  c1q1+  •••  +crqr, 

Thus,  x*  is  a  linear  combination  of  the  singular  vectors,  and  hence,  by  Proposition  8.66 
and  Theorem  4.50,  x*  E  coimgEl  is  the  solution  with  minimal  norm;  the  most  general  least 
squares  solution  has  the  form  x  =  x*  +  z  for  arbitrary  z  E  ker  A.  Q.E.D. 


K,...crr  =^~1p1  b. 


Example  8.70.  Let  us  use  the  pseudoinverse  to  solve  the  linear  system  Ax  =  b,  with 
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In  this  case,  ker  A  ^  {0},  and  so  we  are  not  able  use  the  simpler  formula  (8.59);  thus, 
we  begin  by  establishing  the  singular  value  decomposition  of  A.  The  corresponding  Gram 
matrix 

15  -15  3 

-15  30  -9 

3-9  3 


K  =  AtA  = 


has  eigenvalues  and  eigenvectors 

\  =  24  +  3734  ~  41.4929, 

2.1324  \ 

-3.5662  , 

1-  / 


A2  =  24  -3734  ~  6.5071,  A,  =  0, 


Vx  C7 


v  rs-/ 
v2  — 


-2.5324\ 
-1.2338  , 

i-  ) 


The  singular  values  are  the  square  roots  of  the  positive  eigenvalues,  and  so 

a1  =  a/V  ~  6.4415,  a 2  =  yV  -  2.5509, 


which  are  used  to  construct  the  diagonal  singular  value  matrix 


6.4415  0 

0  2.5509 

Note  that  A  has  rank  2,  because  it  has  just  two  singular  values.  The  first  two  eigenvectors 
of  K  are  the  singular  vectors  of  A ,  and  we  use  the  normalized  (unit)  singular  vectors  to 

/  .4990  -.8472  \ 

form  the  columns  of  Q  —  (q:  q2)  —  —.8344  —.4128  .  Next,  we  apply  A  to  the  singular 

\  .2340  .3345 ) 

vectors  and  divide  by  the  corresponding  singular  value  as  in  (8.55);  the  resulting  vectors 


-4qi 

Pi  =  - -  ~ 


a- 


/— .2180\ 
.7869 
-.5024 
V  .2845  J 


-4q2 

p2  =  — -  ~ 


O’  r 


{  —.7869^ 
-.2180 
-.2845 
V  —.5024/ 


will  form  the  orthonormal  columns  of  P  —  (p1?  p2),  and,  as  you  can  verify, 


-.2180  -.7869 


\ 


A  =  PHQ 


T 


.7869  -.2180 

-.5024  -.2845 
V  .2845  -.5024/ 


6.4415  0 

0  2.5509 


.4990  -.8344  .2340 

-.8472  -.4128  .3345 


is  the  singular  value  decomposition  of  our  coefficient  matrix.  Its  pseudoinverse  is  immedi¬ 
ately  computed: 

/  .2444  .1333  .0556  .1889  \ 

4+=QS_1Pt~  .1556  -.0667  .1111  .0444  . 

1111  0  -.0556  -.0556/ 

Finally,  according  to  Theorem  8.69,  the  least  squares  solution  to  the  original  linear  system 

of  minimal  Euclidean  norm  is  ,  x 

.5667  \ 


x 


=  A+b 


1333 
1667  J 


The  Euclidean  Matrix  Norm 


Singular  values  allow  us  to  finally  write  down  a  formula  for  the  natural  matrix  norm  induced 
by  the  Euclidean  norm  (or  2  norm)  on  Mn,  as  defined  in  Theorem  3.20. 
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Theorem  8.71.  Let 


denote  the  Euclidean  norm  on  Mn.  Let  A  be  a  nonzero  matrix 


with  singular  values  a1  >  •  •  •  >  ar.  Then  the  Euclidean  matrix  norm  of  A  equals  its 

dominant  (largest)  singular  value: 


A  ||2  —  max {  ||  Au 


u 


=  1  }  = 


while 


O  o  =  0. 


(8.61) 


Proof :  Let  q1,...,qn  be  an  orthonormal  basis  of  Mn  consisting  of  the  singular  vectors 
q1? . . . ,  qr  along  with  an  orthonormal  basis  qr+1, . . . ,  qn  of  ker  A.  Thus,  by  (8.57), 


A<ii  = 


&iPi,  i  = 

0,  i  =  r  +  1, . . . ,  n, 

where  p1? . . . ,  pr  form  an  orthonormal  basis  for  img  A.  Suppose  u  is  any  unit  vector,  so 


u  =  ciqi+  •••  +cnq„. 


where 


u 


=  A/Cf  V 


-4-  c2  =1 


thanks  to  the  orthonormality  of  the  basis  vectors  q1? 
formula  (4.5).  Then 


,  qn  and  the  general  Pythagorean 


Au  =  c1cr1p1+  •••  +crcrrpr, 


and  hence 


_  .  „2  ^2  i  ...  i  ^2  ^2 


A u  ||  =  \  cf(j{  + 


+  c;  cr*  , 


since  p1? . . . ,  pn  are  also  orthonormal.  Now,  since  a1  >  a2  >  •  •  •  >  ar,  we  have 


Au 


ci  ai  + 


,  2  2^  /  2  2  i 

+  crar  <  sJc1a1  + 


i  2  2 

+  cra1 


(7 


C1  + 


+  4  <a1Jc21  + 


+  cn  =<r 


1 


Moreover,  if  cx  =  l,c2  =  •••  —  cn  —  0,  then  u  =  q1?  and  hence  ||Au||2  =  || ^4q1 1| 2  = 


<j i Pi  ||2  =  <j1.  This  implies  the  desired  formula  (8.61) 


Example  8.72.  Consider  the  matrix  A  = 


matrix 


/  0 

l 
4 

V  2 

\ 


1 

3  3 


0  5 

1 
5 


Q.E.D. 


The  corresponding  Gram 


0/ 


,2225 

ATi~  (  .0800 

.1250  -.1111 


5 

.0800  .1250 

.1511  -Till 


.3611 


has  eigenvalues  ~  .4472,  A2  ~  .2665,  A3  ~  .0210,  and  hence  the  singular  values  of  A  are 
their  square  roots:  a1  ~  .6687,  a2  —  .5163,  a3  ~  .1448.  The  Euclidean  matrix  norm  of  A 
is  the  largest  singular  value,  and  so  ||T||2  ^  .6687. 


Condition  Number  and  Rank 

Not  only  do  the  singular  values  provide  a  compelling  geometric  interpretation  of  the  action 
of  a  matrix  on  vectors,  they  also  play  a  key  role  in  modern  computational  algorithms.  The 
relative  magnitudes  of  the  singular  values  can  be  used  to  distinguish  well-behaved  linear 
systems  from  ill-conditioned  systems,  which  are  more  challenging  to  solve  accurately.  This 
information  is  quantified  by  the  condition  number  of  the  matrix. 

Definition  8.73.  The  condition  number  of  a  nonsingular  nxn  matrix  is  the  ratio  between 
its  largest  and  smallest  singular  values:  n(A)  =  cr1/crn. 
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Since  the  number  of  singular  values  equals  the  matrix’s  rank,  an  n  x  n  matrix  with 
fewer  than  n  singular  values  is  singular,  and  is  said  to  have  condition  number  oo.  A  matrix 
with  a  very  large  condition  number  is  close  to  singular,  and  designated  as  ill-conditioned ; 
in  practical  terms,  this  occurs  when  the  condition  number  is  larger  than  the  reciprocal 
of  the  machine’s  precision,  e.g.,  107  for  typical  single-precision  arithmetic.  As  the  name 
implies,  it  is  much  harder  to  solve  a  linear  system  Ax  =  b  when  its  coefficient  matrix  is 
ill-conditioned  and  hence  close  to  singular. 

Determining  the  rank  of  a  large  (square  or  rectangular)  matrix  can  be  a  numerical 
challenge.  Small  numerical  inaccuracies  can  have  an  unpredictable  effect.  For  example,  the 

/ 1  1  —1  \  (  1.00001  1.  -1. 

rank  1  matrix  A  —  |  2  2  —2  J  is  very  close  to  A  —  |  2.  2.00001  —2. 

\3  3  -3/  V  3.  3.  -3.00001 

which  has  rank  3  and  so  is  nonsingular.  On  the  other  hand,  the  latter  matrix  is  very 
close  to  singular,  and  this  is  highlighted  by  its  singular  values,  which  are  ay  ~  6.48075 
while  <t2  <t3  «  .000001.  The  fact  that  the  second  and  third  singular  values  are  very 

small  indicates  that  A  is  close  to  a  matrix  of  rank  1  and  should  be  viewed  as  a  numerical 
(or  experimental)  perturbation  of  such  a  matrix.  Thus,  an  effective  practical  method  for 
computing  the  rank  of  a  matrix  is  to  first  assign  a  threshold,  e.g.,  10~5,  to  distinguish 
small  singular  values,  and  then  treat  any  singular  value  lying  below  the  threshold  as  if  it 
were  zero. 

This  idea  is  justified  by  the  following  theorem,  which  gives  a  mechanism  for  constructing 
the  closest  low  rank  approximations  to  a  given  matrix  A,  as  measured  in  the  Euclidean 
matrix  norm. 


Theorem  8.74.  Let  the  m  x  n  matrix  A  have  rank  r  and  singular  value  decomposition 
A  =  PEQT.  Given  1  <  k  <  r,  let  Efc  denote  the  upper  left  k  x  k  diagonal  submatrix  of 
E  containing  the  largest  k  singular  values  on  its  diagonal.  Let  Qk  denote  the  n  x  k  matrix 
formed  from  the  first  k  columns  of  Q,  which  are  the  first  k  orthonormal  singular  vectors  of 
A,  and  let  Pk  be  the  m  x  k  matrix  formed  from  the  first  k  columns  of  P.  Then  the  m  x  n 
matrix  Ak  =  PkT>kQk  has  rank  k.  Moreover,  Ak  is  the  closest  rank  k  matrix  to  A  in  the 
sense  that,  among  all  m  x  n  matrices  B  of  rank  k ,  the  Euclidean  matrix  norm  ||  A  —  B  ||  is 
minimized  when  B  =  Ak. 


Proof :  The  fact  that  Ak  has  rank  k  is  clear,  since,  by  construction,  its  singular  values 
are  ay, . . . ,  ak.  Let  Efc  denote  the  r  x  r  diagonal  matrix  whose  first  k  diagonal  entries  are 
ay, . . .  ,ay  and  whose  last  r  —  k  diagonal  entries  are  all  0.  Clearly  Ak  =  PY>kQT  since 
the  additional  zero  entries  have  no  effect  on  the  product.  Moreover,  E  —  Efc  is  a  diagonal 
matrix  whose  first  k  diagonal  entries  are  all  0  and  whose  last  r  —  k  diagonal  entries  are 
oy+1,  •  •  • ,  oy.  Thus,  the  difference  A  —  Ak  =  P(E  — Efc)QT  has  singular  values  oy+1, . . .  ,  ay. 
Since  crfc+1  is  the  largest  of  these,  Theorem  8.71  implies  that 


A- A 


k 


=  a 


fc+i- 


We  now  prove  that  this  is  the  smallest  possible  among  all  m  x  n  matrices  B  of  rank  k. 
For  such  a  matrix,  according  to  the  Fundamental  Theorem  2.49,  dimkerL>  =  n  —  k.  Let 
Gfc+1  C  Mn  denote  the  ( k  +  l)-dimensional  subspace  spanned  by  the  first  k  +  1  singular 
vectors  q1? . . . ,  qfc+1  of  A.  Since  the  dimensions  of  the  subspaces  Vk+1  and  ker  B  sum  up 
to/c  +  l  +  n  —  /c  =  n  +  1  >  n,  their  intersection  is  a  nontrivial  subspace,  and  hence  we  can 
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find  a  non-zero  unit  vector 


u*  =  Cj  qx  +  •  •  •  +  Cfc+1qfe+1  G  Vk+1  n  ker  B. 


Thus,  since  q1? . . . ,  qfc+1  are  orthonormal, 


u 


—  C-I  + 


n  . 

+  cfc+1  =  1,  and,  moreover,  Bu  =  0, 


which  implies 


(A  -  B)u*  =  Au*  =  ClAqt  +  •••  +  Ck+1Aqk+1  =  CjffjPj  +  •••  +  Cfe+1Ufc+1  Pfc+1 
Since  p1? . . . ,  pfc+1  are  also  orthonormal, 


||  (A  -  B)n*  r  =  cfaf  +  •••  +  4+i°k+i  >  (d  +  '  +  4+iM+i  = 

Thus,  using  the  definition  (3.39)  of  the  Euclidean  matrix  norm 


2 

ak+l 


A  —  B  ||  =  max  {  ||  (A  —  B) u 


u 


=  1}  >  ||(A-R)u*||  >cr 


k+ 1* 


This  proves  that  <rfc+1  minimizes  ||  A  —  B  ||  among  all  such  matrices  B 


Q.E.D. 


Remark.  One  cannot  do  any  better  with  a  matrix  of  lower  rank,  i.e.,  \\A  —  B\\  is  also 
minimized  when  B  —  Ak  among  all  matrices  with  rank  B  <  k.  Justifying  this  statement  is 
left  to  Exercise  8.7.19. 


Observe  that  the  closest  rank  k  approximating  matrix  Ak  is  unique  unless  the  (k  +  l)st 
singular  value  equals  the  kth  one:  ak  =  crfc+1,  in  which  case  one  can  replace  the  last 
columns  of  Pk  and  Qk  with  those  coming  from  the  (k  +  l)st  singular  vectors.  To  compute 
the  rank  k  approximating  matrix  Ak,  we  need  only  compute  the  largest  k  singular  values 
oq, . . .  ,< 7k  of  A ,  which  form  the  diagonal  entries  of  and  corresponding  singular  unit 
vectors  q1? . . . ,  qfc,  which  form  the  columns  of  Qk.  The  columns  of  Pk  are  formed  by  their 
images  px  =  Aq^/a^  . . . ,  pk  =  Aqk/ak. 

Consequently,  when  solving  an  ill-conditioned  linear  system  Ax.  —  b,  a  common  and 
effective  regularization  strategy  is  to  eliminate  all  “insignificant”  singular  values  below  a 
specified  cut-off,  replacing  A  by  its  rank  k  approximation  Ak  specified  by  Theorem  8.74, 
where  k  denotes  the  number  of  significant  singular  values.  Applying  the  corresponding 
approximating  pseudoinverse  Ak  =  QkEk1Pk  to  solve  for  x*  =  Ak  b  will,  in  favorable 
situations,  effectively  circumvent  the  effects  of  ill-conditioning. 

Another  common  application  of  low  rank  approximations  is  in  data  compression,  in 
which  one  replaces  a  very  large  data  matrix,  e.g.,  one  obtained  from  high-resolution  digital 
images,  by  a  suitable  low  rank  approximation  that  captures  the  essential  features  of  the 
data  set  while  reducing  overall  storage  requirements  and  thereby  accelerating  subsequent 
analysis  thereof. 


Spectral  Graph  Theory 

Spectral  graph  theory,  [14,  76],  refers  to  the  study  of  the  properties  of  graphs  that  are 
captured  by  the  spectrum,  meaning  the  eigenvalues  and  singular  values,  of  certain  naturally 
associated  matrices.  The  graph  Laplacian  matrix ,  which  we  encountered  in  Section  6.2,  is  of 
particular  importance.  Recall  that  it  is  defined  as  the  Gram  matrix,  K  =  ATA,  constructed 
from  the  incidence  matrix  A  of  any  underlying  digraph,  noting  that  the  directions  assigned 
to  the  edges  do  not  affect  the  ultimate  form  of  K.  The  eigenvalues  of  the  graph  Laplacian 
matrix  are,  by  definition,  the  squares  of  the  singular  values  of  the  incidence  matrix. 
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As  we  know  —  see  Exercise  2.6.12  —  the  dimension  of  the  kernel  of  the  incidence 
matrix,  and  hence  also  thatof  its  graph  Laplacian  matrix,  equals  the  number  of  connected 
components  of  the  graph.  In  particular,  a  connected  graph  has  a  one-dimensional  kernel 
spanned  by  the  vector  (1,1,. ..,1).  The  magnitude  of  its  final,  meaning  smallest,  singular 
value,  <rr,  can  be  interpreted  as  a  measure  of  how  close  the  graph  is  to  being  disconnected, 
since  if  it  were  zero  (and  thus  technically  not  a  singular  value),  the  graph  would  have 
(at  least)  two  connected  components.  This  is  borne  out  by  numerical  experiments,  which 
demonstrate  that  a  graph  with  a  small  final  singular  value  ar  can  be  disconnected  by 
deleting  a  relatively  small  number  of  its  edges. 


Example  8.75.  Consider  the  graph  sketched  in  Figure  8.4.  Using  the  indicated  vertex 
labels,  we  can  construct  its  graph  Laplacian  directly  using  the  recipe  found  in  Section  6.2: 
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To  four  decimal  places,  the  eigenvalues  are  5.3234, 
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is  not 

especially  well  connected.  Indeed,  we 
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edge  from  vertex  4  to  vertex  5.  The  resulting 

disconnected  graph  Laplacian  is 

the  block  diagonal 
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whose  spectrum  is  the  union  of  the  spectra  of  the  two  constituent  subgraphs,  the  left  one 
having  a  triple  eigenvalue  of  4  and  a  zero  eigenvalue,  the  right  one  having  eigenvalues 
4,  2,  2,0.  The  singular  values  are  their  square  roots.  Note  that  these  are  fairly  close  to 
those  of  the  original  connected  graph.  Such  observations  are  even  more  striking  when  one 
is  dealing  with  much  larger  graphs. 

Spectral  graph  theory  plays  an  increasingly  important  role  in  theoretical  computer  sci¬ 
ence  and  data  analysis.  Applications  include  partitioning  and  coloring  graphs,  random 
graphs  and  random  walks  on  graphs,  routing  and  networks,  and  spectral  clustering.  The 
PageRank  algorithm  that  underlies  Google’s  search  engine  is  based  on  representing  the 
web  pages  on  the  internet  as  a  gigantic  digraph,’*'  which  is  then  viewed  as  a  probabilistic 


1  According  to  our  conventions,  the  internet  digraph  is  not  simple,  since  vertices  can  have  two 
directed  edges  connecting  them,  one  if  the  first  web  page  links  to  the  second  and  another  if  the 
second  links  to  the  first. 
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6 


Figure  8.4.  An  Almost  Disconnected  Graph. 


Markov  process;  see  Section  9.3  for  further  details. 

A  basic  example  is  the  complete  graph  Gn  on  n  vertices,  which  has  one  edge  joining  every 
distinct  pair  of  vertices,  and  hence  is  the  most  connected  simple  graph;  see  Exercise  2.6.10. 
Its  graph  Laplacian  is  easily  constructed,  and  is  the  nxn  matrix  Kn  —  nl  —  E,  where  E  is 
the  nxn  matrix  with  every  entry  equal  to  1.  Since  dimker  E  —  n  —  1  (why?),  we  see  that 
Kn  has  a  single  nonzero  eigenvalue,  namely  =  n,  of  multiplicity  n  —  1  along  with  its 
zero  eigenvalue.  Thus,  the  complete  graph  on  n  vertices  has  n—  1  identical  singular  values: 
a1  =  •  •  •  =  an_ i  =  y/n.  Motivated  by  this  observation,  graphs  whose  nonzero  singular 
values  are  close  together  are,  in  a  certain  sense,  very  highly  connected,  and  are  known 
as  expander  graphs.  Expander  graphs  have  many  remarkable  properties,  which  underlie 
their  many  applications,  including  communication  networks,  error-correcting  codes,  fault- 
tolerant  circuits,  pseudo-random  number  generators,  Markov  processes,  statistical  physics, 
as  well  as  more  theoretical  disciplines  such  as  group  theory  and  geometry,  [45]. 


Exercises 


8.7.1.  Find  the  singular  values  of  the  following  matrices: 


0 

3 


8.7.2.  Write  out  the  singular  value  decomposition  (8.52)  of  the  matrices  in  Exercise  8.7.1. 

8.7.3.  (a)  Construct  the  singular  value  decomposition  of  the  shear  matrix  A  = 

(b)  Explain  how  a  shear  can  be  realized  as  a  combination  of  a  rotation,  and  a  stretch, 
followed  by  a  second  rotation. 


8.7.4.  Find  the  condition  number  of  the  following  matrices.  Which  would  you  characterize 


as  ill-conditioned? 


( d ) 


(-1  3 
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8.7.5.  Find  the  closest  rank  1  and  rank  2  matrices  to  the  matrices  in  Exercise  8.7.4. 


8.7  Singular  Values 


465 


4b  8.7.6.  Solve  the  following  systems  of  equations  using  Gaussian  Elimination  with  three-digit 
rounding  arithmetic.  Is  your  answer  a  reasonable  approximation  to  the  exact  solution? 
Compare  the  accuracy  of  your  answers  with  the  condition  number  of  the  coefficient  matrix, 
and  discuss  the  implications  of  ill-conditioning. 

97  x  +  175  y  +  832;  =  1,  3.001x  +  2.999  y  +  5z  =  1, 

1000  x  4-  999  v  =  1 

(a)  ’  (b)  Mx  +  78y  +  37 z  =  1,  (c)  -x  +  1.002 y  -  2.999 z  =  2 

v  7  554x -I-  555iy  =  —  1  v  7  y  ’  v  J  y 

*  ’  52x  +  97  y  +  46  2;  =  1.  2.002x  +  Ay  +  2^  =  1.002. 


4b  8.7.7.  (a)  Compute  the  singular  values  and  condition  numbers  of  the  2  x  2,  3  x  3,  and  4x4 
Hilbert  matrices,  (b)  What  is  the  smallest  Hilbert  matrix  with  condition  number  larger 
than  106? 


8.7.8.  (a)  What  are  the  singular  values  of  a  1  x  n  matrix?  (b) 
decomposition,  (c)  Write  down  its  pseudoinverse. 


Write  down  its  singular  value 


8.7.9.  Answer  Exercise  8.7.8  for  an  m  x  1  matrix. 

8.7.10.  True  or  false:  Every  matrix  has  at  least  one  singular  value. 

8.7.11.  Explain  why  the  singular  values  of  A  are  the  same  as  the  nonzero  eigenvalues  of  the 
positive  definite  square  root  matrix  S  =  V ATA  ,  defined  in  Exercise  8.5.27. 

0  8.7.12.  Prove  that  if  the  square  matrix  A  is  nonsingular,  then  the  singular  values  of  A-1  are 
the  reciprocals  of  the  singular  values  of  A.  How  are  their  condition  numbers  related? 

rj~ 1 

8.7.13.  True  or  false:  The  singular  values  of  A  are  the  same  as  the  singular  values  of  A. 


0  8.7.14.  (a)  Let  A  be  a  nonsingular  matrix.  Prove  that  the  product  of  the  singular  values  of  A 
equals  the  absolute  value  of  its  determinant:  o1  a2  •  •  •  crn  =  \  det  A  |. 

(b)  Does  their  sum  equal  the  absolute  value  of  the  trace:  crl  +  •  •  •  +  an  =  |  tr  A  |? 

(c)  Show  that  if  |  det  A  |  10  ^ ,  then  its  minimal  singular  value  satisfies  cn  <C  10  ^ 

(d)  True  or  false:  A  matrix  whose  determinant  is  very  small  is  ill-conditioned. 

(e)  Construct  an  ill-conditioned  matrix  with  det  A  =  1. 

8.7.15.  True  or  false:  If  A  is  a  symmetric  matrix,  then  its  singular  values  are  the  same  as  its 
eigenvalues. 


8.7.16.  True  or  false:  If  U  is  an  upper  triangular  matrix  whose  diagonal  entries  are  all  positive, 
then  its  singular  values  are  the  same  as  its  diagonal  entries. 

9  9 

8.7.17.  True  or  false:  The  singular  values  of  A  are  the  squares  of  the  singular  values  of  A. 

8.7.18.  True  or  false:  If  B  =  S~1AS  are  similar  matrices,  then  A  and  B  have  the  same 
singular  values. 


0  8.7.19.  Under  the  assumptions  of  Theorem  8.74,  show  that 
B  =  Ak  among  all  matrices  with  rankH  <  k. 


B 


2  is  also  minimized  when 


0  8.7.20.  Suppose  A  is  an  m  x  n  matrix  of  rank  r  <  n.  Prove  that  there  exist  arbitrarily  close 
matrices  of  maximal  rank,  that  is,  for  every  £  >  0  there  exists  an  m  x  n  matrix  B  with 
rank  B  =  nsuch  that  the  Euclidean  matrix  norm  A  —  B  <  e. 


8.7.21.  True  or  false:  If  det  A  >  1,  then  A  is  not  ill-conditioned. 


8.7.22.  Let  A  = 

6 

-4 

-4 

6 

1) 

-1 

,  and  letE  =  {y  =  Ax  x  =l}be  the  image  of  the  unit 

V 

1 

-1 

11  / 

Euclidean  sphere  under  the  linear  map  induced  by  A.  (a)  Explain  why  E  is  an  ellipsoid 
and  write  down  its  equation,  (b)  What  are  its  principal  axes  and  their  lengths  —  the  semi¬ 
axes  of  the  ellipsoid?  (c)  What  is  the  volume  of  the  solid  ellipsoidal  domain  enclosed  by  El 
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'T' 

C  8.7.23.  Let  A  be  a  nonsingular  2x2  matrix  with  singular  value  decomposition  A  =  PEQ  and 
singular  values  aq  >  o2  >  0.  (a)  Prove  that  the  image  of  the  unit  (Euclidean)  circle  under 
the  linear  transformation  defined  by  A  is  an  ellipse,  E  =  {Ax  |  ||x||  =  1},  whose  principal 
axes  are  the  columns  p1?p2  of  P,  and  whose  corresponding  semi-axes  are  the  singular 
values  aq ,  aq.  (b)  Show  that  if  A  is  symmetric,  then  the  ellipse’s  principal  axes  are  the 
eigenvectors  of  A  and  the  semi-axes  are  the  absolute  values  of  its  eigenvalues,  (c)  Prove 
that  the  area  of  E  equals  7r  |  det  A  |.  (d)  Find  the  principal  axes,  semi-axes,  and  area  of  the 

ellipses  defined  by  (i)  I  9  }  ),  (ii)  (  ?  \  ),  (in)  (5  f  V  (e)  What  happens  if 

A  is  singular?  \  J  \  /  \  1 

0  8.7.24.  Optimization  Principles  for  Singular  Values :  Let  A  be  any  nonzero  m  x  n  matrix. 

Prove  that  (a)  aq  =  max{  ||  Au||  |  ||u||  =  1  }.  (b)  Is  the  minimum  the  smallest  singular 

value?  (c)  Can  you  design  optimization  principles  for  the  intermediate  singular  values? 

0  8.7.25.  Let  A  be  a  square  matrix.  Prove  that  its  maximal  eigenvalue  is  smaller  than  its 
maximal  singular  value:  max  |  |  <  max  ai.  Hint:  Use  Exercise  8.7.24. 

8.7.26.  Compute  the  Euclidean  matrix  norm  of  the  following  matrices. 


(a) 


(e) 


(b) 


4  \ 
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8.7.27.  Find  a  matrix  A  whose  Euclidean  matrix  norm  satisfies  A  ||  /  ||  A 
0  8.7.28.  Let  iL  >  0  be  a  positive  definite  matrix.  Characterize  the  matrix  norm  induced  by  the 

m 

inner  product  (x,y )  =  x  ify.  Hint:  Use  Exercise  8.5.45. 

8.7.29.  Let  A  =  (  j  ^  V  Compute  the  matrix  norm  ||A||  using  the  following  norms  in  IR2: 


(a)  the  weighted  oo  norm 

v 


=  max{2  |  v1  |,  3 1  v2  \  };  (b)  the  weighted  1  norm 
=  2  |  v i  |  +  3 1  v2  |;  (c)  the  weighted  inner  product  norm  ||  v 

(d)  the  norm  associated  with  the  positive  definite  matrix  K  = 


=  \/2vf  +  3v 


2  . 

2  5 


2  -1 

-1  2 


0  8.7.30.  Let  A  be  an  n  x  n  matrix  with  singular  value  vector  cr  =  (cq, . . . ,  <r  ).  Prove  that 


(a) 


OO 


A 1 1 2 ;  (b)  ||  cr  || 2  =  ||  A  ||^p,  the  Frobenius  norm  of  Exercise  3.3.51. 

also  defines  a  useful  matrix  norm. 


Remark.  The  1  norm  of  the  singular  value  vector 
known  as  the  Ky  Fan  norm. 

0  8.7.31.  Let  A  be  an  m  x  n  matrix  with  singular  values  aq, . . . ,  ar.  Prove  that 


r 


m 


n 


E  d  =  E  E 


a 


i—  1 


1  J  =  1 


ij  • 


(8.63) 


8.7.32.  Let  A  be  a  nonsingular  square  matrix.  Prove  the  following  formulas  for  its  condition 
number: 


(a)  k(A)  = 


max{  ||  A u 


u 


=  i} 


min{  ||  A u 


u 


=  i} 


(b)  k(A)  =  \\A\L  || A 


8.7.33.  Find  the  pseudoinverse  of  the  following  matrices:  (a) 


1  -1 
3  3 


(b) 


2- 


1  -2 

2  1 


(2 

0\ 

^0 

0 

1\ 

(c) 

0 

-1 

,  (d) 

0 

-1 

0 

0/ 

^0 

0 

0/ 

(e) 
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8.7.34.  Use  the  pseudoinverse  to  find  the  least  squares  solution  of  minimal  norm  to  the 

x  —  3y  - 

(c)  2  x  +  y  - 
x  +  y  = 


following  linear  systems:  (a) 


x  +  y  =  1, 
3x  +  3  y  =  —2; 


(*>) 


x  +  y  +  z  =  5, 
2x  —  y  A  z  =  2; 


2, 

-1. 

0. 


T  8.7.35.  Prove  that  the  pseudoinverse  satisfies  the  following  identities:  (a)  (A+)  +  =  A, 

(b)  AA+A  =  A,  (c)  A+AA+=A+,  (d)  (AA+f  =  y4A+,  (e)  (A+y4)t  =  A+A. 

8.7.36.  Suppose  b  G  imgGl  and  kerGl  =  {0}.  Prove  that  x*  =  Gl+b  is  the  unique  solution  to 
the  linear  system  Ax  =  b.  What  if  ker  A  ^  {0}? 

8.7.37.  Choose  a  direction  for  each  of  the  edges  and  write  down  the  incidence  matrix  A  for  the 

m 

graph  sketched  in  Figure  8.4.  Verify  that  its  graph  Laplacian  (8.62)  equals  K  =  A  A. 


8.7.38.  Determine  the  spectrum  for  the  graphs  in  Exercise  2.6.3. 

8.7.39.  Determine  the  spectrum  of  a  graph  given  by  the  edges  of  (i)  a  triangle;  (ii)  a  square; 
(Hi)  a  pentagon.  Can  you  determine  the  formula  for  the  spectrum  of  the  graph  given  by  an 
n  sided  polygon?  Hint :  See  Exercise  8.2.49. 


8.7.40.  Determine  the  spectrum  for  the  trees  in  Exercise  2.6.9.  Can  you  make  any  conjectures 
about  the  nature  of  the  spectrum  of  a  graph  that  is  a  tree? 


8.8  Principal  Component  Analysis 

Singular  values  and  vectors  also  underlie  contemporary  statistical  data  analysis.  In  partic¬ 
ular,  the  method  of  Principal  Component  Analysis  has  assumed  an  increasingly  essential 
role  in  a  wide  range  of  applications,  including  data  mining,  machine  learning,  image  pro¬ 
cessing,  speech  recognition,  semantics,  face  recognition,  and  health  informatics;  see  [47] 
and  the  references  therein.  Given  a  large  data  matrix  —  containing  many  data  points 
belonging  to  a  high-dimensional  vector  space  —  the  singular  vectors  associated  with  the 
larger  singular  values  indicate  the  principal  components  of  the  data,  while  small  singular 
values  indicate  relatively  unimportant  components.  Projection  onto  the  low-dimensional 
subspaces  spanned  by  the  dominant  singular  vectors  can  expose  structure  in  their  otherwise 
inscrutable  large  data  sets.  The  earliest  descriptions  of  the  method  of  Principal  Compo¬ 
nent  Analysis  are  to  be  found  in  the  first  half  of  the  twentieth  century  in  the  work  of  the 
British  statistician  Karl  Pearson,  [65],  and  American  statistician  Harold  Hotelling,  [46]. 

Variance  and  Covariance 

We  begin  with  a  brief  description  of  basic  statistical  concepts.  Suppose  that  x1: . . . ,  xm  E  R 
represent  a  collection  of  m  measurements  of  a  single  physical  quantity,  e.g.,  the  distance  to 
a  star  as  measured  by  various  physical  apparatuses,  the  speed  of  a  car  at  a  given  instant 
measured  by  a  collection  of  instruments,  a  person’s  IQ  as  measured  by  a  series  of  tests, 
etc.  Experimental  error,  statistical  fluctuations,  quantum  mechanical  effects,  numerical 
approximations,  and  the  like  imply  that  the  individual  measurements  will  almost  certainly 
not  precisely  agree.  Nevertheless,  one  wants  to  know  the  most  likely  value  of  the  measured 
quantity  and  the  degree  of  confidence  that  one  has  in  the  proposed  value.  A  variety  of 
statistical  tests  have  been  devised  to  resolve  these  issues,  and  we  refer  the  reader  to,  for 
example,  [20,43,87]. 

The  most  basic  collective  quantity  of  such  a  data  set  is  its  mean,  or  average ,  denoted 
by 


x 


m 


(8.64) 
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Figure  8.5.  Variance. 


Barring  some  inherent  statistical  or  experimental  bias,  the  mean  can  be  viewed  as  the  most 
likely  value,  known  as  the  expected  value ,  of  the  quantity  being  measured,  and  thus  the 
best  bet  for  its  actual  value.  (More  generally,  if  the  measurements  are  sampled  from  a 
known  probability  distribution,  then  one  works  with  a  suitably  weighted  average.  To  keep 
the  formulas  relatively  simple,  we  will  assume  a  uniform  distribution  throughout,  and  leave 
generalizations  for  the  reader  to  pursue  by  consulting  the  statistical  literature.)  Once  this 
has  been  computed,  it  will  be  helpful  to  normalize  the  measurements  to  have  mean  zero, 
which  is  done  by  subtracting  off  their  mean,  letting 

a-  =  x-  —  x,  i  =  l,...,  m,  with  a  =  1  - — — —  —  0,  (8.65) 

m 

represent  the  deviations  of  the  measurements  from  their  overall  mean.  It  will  also  help  to 
assemble  these  quantities  into  column  vectors: 


(  X1  \ 

( \ 

(l\ 

x2 
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™  ex 

X  = 

# 

,  a  = 

# 

=  x  —  x  e,  where  e  = 

# 

6  1™,  x- 

• 

• 

• 

m 

^  ^ m  j 

V  Q'm  j 

V  i  / 

Thus,  we  can  write  the  normalized  measurement  vector  as 

a  =  (I  — —  E  j  x,  (8.66) 

\  m  J 


where  I  is  the  m  x  m  identity  matrix  and  E  =  eeT  is  the  m  x  m  matrix  all  of  whose 
entries  equal  1. 

The  measurement  variance  tells  us  how  widely  the  data  points  are  “scattered”  about 
their  mean.  As  in  least  squares  analysis,  this  is  quantified  by  summing  the  squares  of  their 
deviations,  and  denoted  by 


(8.67) 


where  we  continue  to  use  the  usual  Euclidean  norm'*'  throughout,  and  v  >  0  is  a  certain 
specified  prefactor,  which  could  also  be  viewed  as  an  overall  weight  to  the  Euclidean  norm. 
The  square  root  of  the  variance  is  known  as  the  standard  deviation ,  and  denoted  by 


(7  —  (J  x 


(8.68) 


The  prefactor  v  can  assume  different  values  depending  upon  one’s  statistical  objectives; 
common  examples  are  (a)  v  =  1/m  for  the  “naive”  variance;  (b)  v  =  l/(m  —  1)  (as¬ 
suming  rri  >  1,  i.e.,  there  are  at  least  2  measurements)  for  an  unbiased  version;  (c)  v  — 
l/(m  +  1)  for  the  minimal  mean  squared  estimation  of  variance;  and  (d)  more  exotic 


More  general  probability  distributions  rely  on  suitably  weighted  norms,  which  can  be  straight¬ 
forwardly  incorporated  into  the  mathematical  framework. 
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Pxy  =  ~-95  Pxy  =  -'7  Pxy  =  0  Pxy  =  • 7  Pxy  =  -95 

Figure  8.6.  Correlation. 


choices,  e.g.,  if  one  desires  an  unbiased  estimation  of  standard  deviation  instead  of  variance, 
cf.  [43;  p.349].  Fortunately,  apart  from  the  resulting  numerical  values,  the  underlying 
analysis  is  independent  of  the  prefactor. 

The  smaller  the  variance  or  standard  deviation,  the  less  spread  out  the  measurements, 
and  hence  the  more  accurately  the  mean  x  is  expected  to  approximate  the  true  value  of 
the  physical  quantity.  Figure  8.5  contains  several  scatter  plots ,  in  which  each  real-valued 
measurement  is  indicated  by  a  dot  and  their  mean  is  represented  by  a  small  vertical  bar. 
The  left  plot  shows  data  with  relatively  small  variance,  since  the  measurements  are  closely 
clustered  about  their  mean,  whereas  on  the  right  plot,  the  variance  is  large  because  the 
data  is  fairly  spread  out. 

Now  suppose  we  make  measurements  of  several  different  physical  quantities.  The  in¬ 
dividual  variances  in  themselves  fail  to  capture  many  important  features  of  the  resulting 
data  set.  For  example,  Figure  8.6  shows  the  scatter  plots  of  data  sets  each  representing 
simultaneous  measurements  of  two  quantities,  as  specified  by  their  horizontal  and  vertical 
coordinates.  All  have  the  same  variances,  both  individual  and  cumulative,  but  clearly  rep¬ 
resent  different  interrelationships  between  the  two  quantities.  In  the  central  plot,  they  are 
completely  uncorrelated,  while  on  either  side  they  are  progressively  more  correlated  (or 
anti-correlated),  meaning  that  the  value  of  the  first  measurement  is  a  strong  indicator  of 
the  value  of  the  second. 

This  motivates  introducing  what  is  known  as  the  covariance  between  a  pair  of  measured 
quantities  x  =  (  "j^  ^  2  ^  ^  tZ/  ryY\J  )T  and  y  =  ( y1,  y2, . . . ,  ym  )T  as  the  expected  value  of  the 

product  of  the  deviations  from  their  respective  means  T,  y.  We  set 

m  rn 

=  -x)(Vk  -y)  =  v  N  akbk  =  va'hi  (8-69) 

k = 1  k= 1 


T 

where  the  normalized  vector  a  =  (  a1?  a2, . . . ,  am  )  has  components  a  J\l  Js»  ^  while 

T 

b  =  ( b1:  62,  •  •  • ,  )  has  components  bk  =  yk  —  y •  Note  that,  in  view  of  (8.67),  the 

covariance  of  a  set  of  measurements  with  itself  is  its  variance:  cr  =  a2.  The  correlation 
between  the  two  measurement  sets  is  then  defined  as 

Pxy  =  AT  >  (8-70) 


® x 


and  is  independent  of  the  prefactor  v.  There  is  an  overall  bound  on  the  correlation,  since 
the  Cauchy-Schwarz  inequality  (3.18)  implies  that 


|  cxy  |  <  ax  ayi  and  hence  -l<Pxy<l.  (8.71) 

The  closer  p  is  to  +1,  the  more  the  variables  are  correlated;  the  closer  to  —1,  the  more 
they  are  anti-correlated,  while  pxy  =  0  when  the  variables  are  uncorrelated.  In  Figure  8.6, 
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each  scatter  plot  is  labelled  by  its  correlation.  Statistically  independent  variables  are  auto¬ 
matically  uncorrelated,  but  the  converse  is  not  necessarily  true,  since  correlation  measures 
only  linear  dependencies,  and  it  is  possible  for  nonlinearly  dependent  variables  to  never¬ 
theless  have  zero  correlation. 

More  generally,  suppose  we  are  given  m  measurements  of  n  distinct  quantities  x1? . . . ,  xn. 
Let  X  be  the  m  x  n  data  matrix  whose  entry  xtJ  represents  the  zth  value  or  measurement 

of  the  jth  quantity.  The  column  x  •  =  (aq  •, . . . ,  xmj-)T  of  X  contains  the  measurements  of 
the  jth  quantity,  while  the  row  ^  =  ( xil: . . . ,  xin  )  is  the  ith  data  point.  For  example,  if  the 

data  comes  from  sampling  a  set  S  C  Mn,  each  row  of  the  data  matrix  represents  a  different 
sample  point  ^  E  S.  Similarly,  in  image  analysis,  each  row  of  the  data  matrix  represents 
an  individual  image,  whose  components  are,  say,  gray  scale  data  for  the  individual  pixels, 
or  color  components  of  pixels  —  in  this  case  3  or  4  components  per  pixel  for  the  RGB  or 
CMKY  color  scales  —  or  Fourier  or  wavelet  coefficients  representing  the  image,  etc. 

The  (row)  vector  containing  the  various  measurement  means  is 

x  =  (aq, . . . ,  xn)  —  —  eTX.  (8.72) 

171 

We  let  a^,  with  entries  atJ  =  aq  ■  —  x-  representing  the  deviations  from  the  mean,  denote 
the  corresponding  normalized  (mean  zero)  measurement  vectors,  which  form  the  columns 
of  the  normalized  data  matrix 


A  =  an)  =  V  —  ex  =  (  I  — —  E  )  X; 

17b 


(8.73) 


cf.  (8.66).  The  fact  that  the  columns  of  A  all  have  mean  zero  is  equivalent  to  the  statement 
that  e  E  coker  A.  We  will  call  the  rows  cti  =  (ail: . . . ,  ain)  of  A  the  normalized  data  points. 


We  next  define  the  n  x  n  covariance  matrix  K  of  the  data  set,  whose  entries  equal  to 
the  pairwise  covariances  of  the  individual  measurements: 


m  rn 

kij  =  axiXj  =  vai  ■  aj  =  V  akiakj  =  v  ( xki  -  -  Xj),  i,j  =  l,...,n.  (8.74) 

k  =  1  k  =  1 

The  diagonal  entries  of  K  are  the  individual  variances:  k--  =  =  a2r, . .  Observe  that 

the  covariance  matrix  is,  up  to  a  factor,  the  symmetric  Gram  matrix  for  the  normalized 
measurement  vectors  a1? . . . ,  an,  and  hence 

K  =  vAtA.  (8.75) 


Theorem  3.34  tells  us  that  the  covariance  matrix  is  always  positive  semi-definite,  K  >  0, 
and  is  positive  definite,  K  >  o,  unless  the  columns  of  A  are  linearly  dependent,  or,  equiv¬ 
alently,  there  is  a  nontrivial  exact  linear  relationship  among  the  normalized  measurement 
vectors:  c1a1  +  •  •  •  +  cnan  =  0  with  <q, . . . ,  cn  not  all  zero  —  an  unlikely  event  given  the 
presence  of  measurement  errors  and  statistical  fluctuations. 


8.8  Principal  Component  Analysis 
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Remark.  Principal  Component  Analysis  requires  that  all  n  variables  be  measured  the 
same  number,  m,  of  times,  producing  a  rectangular  m  x  n  normalized  data  matrix  X,  all 
of  whose  entries  are  specified.  Extending  the  analysis  to  missing  or  unavailable  data  is  a 
very  active  area  of  contemporary  research,  which  we  unfortunately  do  not  have  space  to 
examine  here.  We  refer  the  interested  reader  to  [24,  28]  and  the  references  therein. 


The  Principal  Components 


The  covariance  matrix  of  a  data  set  (8.75)  encodes  the  information  concerning  the  possible 
linear  dependencies  and  interrelationships  among  the  data.  However,  due  to  its  potentially 
large  size,  it  is  often  not  easy  to  extract  the  important  components  and  implications. 
Nor  is  visualization  a  good  option,  since  the  scatter  plots  he  in  a  high-dimensional  space. 
Standard  or  random  projections  of  high-dimensional  data  onto  two-  or  three-dimensional 
subspaces  give  some  limited  insight,  but  the  results  are  highly  dependent  on  the  direction 
of  projection  and  tend  to  obscure  any  underlying  structure.  For  example,  projecting  the 
data  sets  in  Figure  8.6  onto  the  x-  and  y- axes  produces  more  or  less  the  same  results, 
thereby  hiding  the  variety  of  two-dimensional  correlations.  A  more  systematic  approach  is 
to  locate  the  so-called  principal  components  of  the  data,  and  this  leads  us  us  back  to  the 
singular  value  decomposition  of  the  data  matrix. 

The  basic  idea  behind  Principal  Component  Analysis ,  often  abbreviated  PC  A,  is  to 
focus  on  directions  in  which  the  variance  of  the  data  is  especially  large.  Given  the  m  x  n 
normalized  data  matrix  A,  we  define  the  first  principal  direction  as  that  in  which  the  data 
experiences  the  most  variance.  By  “direction”,  we  mean  a  line  through  the  origin  in  Mn, 
and  the  variance  is  computed  from  the  orthogonal  projection  of  the  data  measurements 
onto  the  line.  Each  line  is  spanned  by  a  unit  vector  u  =  ( iq,  u2:  •  •  • ,  un  )  with  ||  u  ||  =  1. 
(Actually,  there  are  two  unit  vectors,  d=u,  in  each  line,  but,  as  we  will  see,  it  doesn’t 
matter  which  one  we  choose.)  The  orthogonal  projection  of  the  ith  normalized  data  point 
(\i  =  (ail: . . . ,  ain)  onto  the  line  spanned  by  u  is  given  by  the  projection  formula  (4.41), 
namely  pi  —  cqu,  i  =  1, . . . ,  m.  The  result  is  the  projected  measurement  vector 

f  Pi  \ 

P2 


P  = 

\Pm  ) 

Our  goal  is  to  maximize  its  variance 


=  A  u  =  u1a1  + 


+  un  an 


(8.76) 


Aw 


2  =  v  (Au)T Au  =  v  (uT AtAu)  =  v  (uT Ku). 


(8.77) 


over  all  possible  choices  of  unit  vector  u.  But,  ignoring  the  irrelevant  factor  v  >  0,  this  is 
precisely  the  maximization  problem  that  was  solved  by  Theorem  8.40.  We  thus  immediately 
deduce  that  the  first  principal  direction  is  given  by  the  dominant  unit  eigenvector  u  =  q: 
of  the  covariance  matrix  K  —  v  AJ A,  or,  equivalently,  the  dominant  unit  singular  vector  of 
the  normalized  data  matrix  A.  The  maximum  variance  is,  up  to  a  factor,  the  dominant, 
or  largest,  eigenvalue,  or,  equivalently,  the  square  of  the  dominant  singular  value,  namely, 
maxu  ap  =  is  A1  =  is  <r2,  while  the  dominant  singular  value  is,  again  up  to  a  factor,  the 
maximal  standard  deviation  of  the  projected  measurements:  maxu  crp  =  yjv  cr1. 

The  second  principal  direction  is  assumed  to  be  orthogonal  to  the  first,  so  as  to  avoid 
contaminating  it  with  the  already  noted  direction  of  maximal  variance,  and  is  to  be  chosen 
so  that  the  variance  of  its  projected  measurements  is  maximized  among  all  such  orthogonal 
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directions.  Thus,  the  second  principal  direction  will  maximize  <t^,  as  given  by  (8.77),  over  all 
unit  vectors  u  satisfying  u-q:  =0.  More  generally,  given  the  first  j  —  1  principal  directions 
q1? . . . ,  q  -_i,  the  jth  principal  component  is  in  the  direction  u  =  that  maximizes  the 
variance 


<Jp  =  v  (u1  K u)  over  all  vectors  u  satisfying 


T 


u 


=  1, 


u 


<ll  = 


=  u  •  q 


■3~ 1 


=  0. 


Theorem  8.42  immediately  implies  that  q^  is  a  unit  eigenvector  of  K  associated  with  its 
jth  largest  eigenvalue  A  •  =  <r|,  which  therefore  is,  up  to  a  factor,  the  jth  principal  variance. 
Summarizing,  we  have  proved  the  Fundamental  Theorem  of  Principal  Component  Analysis. 


Theorem  8.76.  The  jth  principal  direction  of  a  normalized  data  matrix  A  is  its  jth  unit 
singular  vector  q^.  The  corresponding  principal  standard  deviation  yjv  a  ■  is  proportional 
to  its  jth  singular  value  cr-. 

In  applications,  one  designates  a  certain  number,  say  k:  of  the  dominant  (largest)  vari¬ 
ances  v  g\  >  v  <j 2  >  •  •  •  >  as  “principal”  and  the  corresponding  unit  singular  vectors 
q1? . . . ,  qfc  as  the  principal  directions.  The  value  of  k  depends  on  the  user  and  on  the  appli¬ 
cation.  For  example,  in  visualization,  k  —  2  or  3  in  order  to  plot  these  components  of  the 
data  in  the  plane  or  in  space.  More  generally,  one  could  specify  k  based  on  some  overall  size 
threshold,  or  where  there  is  a  perceived  gap  in  the  magnitudes  of  the  variances.  Another 
choice  is  such  that  the  principal  variances  are  those  that  make  up  some  large  fraction,  e.g., 
pi  —  95%,  of  the  total  variance: 

(r  \  n  n 

Y  4  =  (8-78) 

i  =  1  /  i.j  =  1  i  —  1 

where  the  next-to-last  equality  follows  from  (8.63). 

Theorem  8.74  says  that,  in  the  latter  cases,  the  number  of  principal  components,  i.e., 
the  number  of  significant  singular  values,  will  determine  the  approximate  rank  k  of  the 
covariance  matrix  K  and  hence  the  data  matrix  A,  also.  Thus,  the  normalized  data  (ap¬ 
proximately)  lies  in  a  ^-dimensional  subspace.  Further,  the  variance  in  any  direction  or¬ 
thogonal  to  principal  directions  is  relatively  small  and  hence  relatively  unimportant.  As  a 
consequence,  dimensional  reduction  by  orthogonally  projecting  the  data  vectors  onto  the 
^-dimensional  subspace  spanned  by  the  principal  directions  (singular  vectors)  q1? . . . ,  qfc, 
serves  to  eliminate  significant  redundancies. 

The  coordinates  of  the  data  in  the  jth  principal  direction  are  provided  by  orthogonal 
projection,  namely  the  entries  of  the  image  vectors  Ac\-  =  cr-p  -.  Thus,  approximating 
the  data  by  its  k  principal  components  coincides  with  the  closest  rank  k  approximation 
Ak  =  PkYkQk  to  the  data  matrix  A  given  by  Theorem  8.74.  Moreover,  rewriting  the  data 
in  terms  of  the  principal  coordinates ,  meaning  those  supplied  by  the  principal  directions, 
serves  to  diagonalize  the  principal  covariance  matrix  vKk  =  isAkAk,  since,  as  in  (8.58), 

Qk(vAlAk)Qk  =  l/'£h 

the  result  being  a  diagonal  matrix  containing  the  k  principal  variances  along  the  diago¬ 
nal.  This  has  the  important  consequence  that  the  covariance  between  any  two  principal 
components  of  the  data  is  zero,  and  thus  the  principal  components  are  all  uncorrelated! 
In  geometric  terms,  the  original  data  tends  to  form  an  ellipsoid  in  the  high-dimensional 
data  space,  and  the  principal  directions  are  aligned  with  its  principal  semi-axis,  thereby 
conforming  to  and  exposing  the  intrinsic  geometry  of  the  data  set. 
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Thus,  to  perform  Principal  Component  Analysis  on  a  complete  data  set,  consisting  of  m 
measurements  of  n  quantities,  one  forms  the  mxn  normalized  data  matrix  A  whose  (i,  j)th 
entry  equals  the  zth  measurement  of  the  jth  variable  minus  the  mean  of  the  jth  variable.  The 
principal  directions  are  the  first  k  singular  vectors  of  A ,  meaning  the  eigenvectors  of  the 
positive  (semi-)dehnite  covariance  matrix  K  =  v  AT  A,  while  the  principal  variances  are, 
up  to  the  overall  factor  i/,  the  corresponding  eigenvalues  or  equivalently  the  squares  of  the 
singular  values.  The  principal  coordinates  are  given  by  the  entries  of  the  resulting  matrix 
PkTik  whose  columns  are  the  image  vectors  cr  -p  •  =  A q  -  specifying  the  measurements  in 
the  principal  directions. 


Exercises 


Note :  For  simplicity,  take  the  prefactor  v  =  1  unless  otherwise  indicated. 

8.8.1.  Find  the  mean,  the  variance,  and  the  standard  deviation  of  the  following  data  sets: 

(a)  1.1,1.3,1.5,1.55,1.6,1.9,2,2.1;  (b)  2.,  .9,  .7, 1.5,  2.6,  .3,  .8, 1.4;  (c)  -2.9, -.5,  .1, -1.5, 

-3.6, 1.3,  .4, -.7;  (d)  1.1,  .2,  .1,  .6, 1.3, -.4, -.1,  .4;  (e)  .9, -.4, -.8, .,  1., -1.6, -1.2, -.7. 

8.8.2.  Find  the  mean,  the  variance,  and  the  standard  deviation  of  the  data  sets 

{  f(pc)  |  x  =  i/10,  i  =  — 10, . . . ,  10  }  associated  with  the  following  functions  f(x): 

(a)  3x  +  1,  (b)  x2,  (c)  x3  —  2x,  (d)  e~x ,  (e)  tan-1  x. 

8.8.3.  Determine  the  variance  and  standard  deviation  of  the  normally  distributed  data  points 

{  e~x  |  x  =  i/10,  i  =  —10, . . . ,  10  }  for  a  =  1,  2,  and  10. 

8.8.4.  Prove  that  axy  =  xp  -xy,  where  x  and  y  are  the  means  of  {xi }  and  { y- },  respectively, 
while  ccy  denotes  the  mean  of  the  product  variable  {xiyi}. 

0  8.8.5.  Show  that  one  can  compute  the  variance  of  a  set  of  measurements  without  reference  to 
the  mean  by  the  following  formula 


2  _  v 

° x  —  o  ^ 

2  m 


m 


rri 


U 


J2  J2  (xi  -  xj )  =-E(Jr  xj)  ■ 

i-i  m  ■  /  • 


1  3  =  1 


8.8.6.  Let  A  be  an  m  x  n  matrix  that  is  normalized,  meaning  that  each  of  its  column  sums  is 
zero.  Show  that  A  B,  where  B  is  any  n  x  k  matrix,  is  also  normalized. 

f ~r ' 

8.8.7.  Given  a  singular  value  decomposition  A  =  P  E  Q  ,  prove  that  if  the  columns  of  A  have 
zero  mean,  then  so  do  the  columns  of  PS. 

8.8.8.  Construct  the  5x5  covariance  matrix  for  the  5  data  sets  in  Exercise  8.8.1  and  find  its 
principal  variances  and  principal  directions.  What  do  you  think  is  the  dimension  of  the 
subspace  the  data  lies  in? 

Q 

X  8.8.9.  For  each  of  the  following  subsets  S  C  M  ,  (i)  Compute  a  fairly  dense  sample  of  data 
points  zi  G  S;  ( ii )  find  the  principal  components  of  your  data  set,  using  fi  =  .95  in 
the  criterion  in  (8.78);  (Hi)  using  your  principal  components,  estimate  the  dimension  of 
the  set  S.  Does  your  estimate  coincide  with  the  actual  dimension?  If  not,  explain  any 
discrepancies. 


(a)  The  line  segment  S  =  {  ( t  +  1,  3t  —  1,  —2  t)T  \  —  1  <  t  <  1  }; 

(b)  the  set  of  points  z  on  the  three  coordinate  axes  with  Euclidean  norm 

(c)  the  set  of  “probability  vectors”  S  =  {(x,y,  z)  \0  <  x,y,  z  <  1,  x  +  y  +  z  =  1}; 


z 


<i; 


z  I 

1  < 

(1 

z 

z  I 

1  oo 

(1 

z 

<  1 }  for  the  oo  norm; 

(g)  the  unit  sphere  S  =  {  ||  z  ||  =  1}  for  the  oo  norm. 
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4»  8.8.10.  Using  the  Euclidean  norm,  compute  a  fairly  dense  sample  of  points  on  the  unit  sphere 


S  =  { x  g 


X 


=  1  }.  (a)  Set  fi  =  .95  in  (8.78),  and  then  find  the  principal 


components  of  your  data  set.  Do  they  indicate  the  two-dimensional  nature  of  the  sphere? 

If  not,  why  not?  (b)  Now  look  at  the  subset  of  your  data  that  is  within  a  distance  r  >  0  of 


t  T  ii 

the  north  pole,  i.e.,  ||x  —  (0,0, 1)J  II  <r,  and  compute  its  principal  components.  How  small 
does  r  need  to  be  to  reveal  the  actual  dimension  of  5?  Interpret  your  calculations. 


0  8.8.11.  Show  that  the  first  principal  direction  q1  can  be  characterized  as  the  direction  of  the 
line  that  minimizes  the  sums  of  the  squares  of  its  distances  to  the  data  points. 

U  8.8.12.  Let  i  =  1, . . . ,  m,  be  a  set  of  data  points  in  the  plane.  Suppose 

L*  C  M2  is  the  line  that  minimizes  the  sums  of  the  squares  of  the  distances  from  the  data 

rri 

points  to  it,  i.e.,  dist(x,  L)  =  22  dist(x^,L),  among  all  lines  L  C  M2.  (a)  Prove  that 

i—  1 

x  =  (x,y)  G  L*.  (b)  Use  Exercise  8.8.11  to  find  L*.  (c)  Apply  your  result  to  the  data 
points  in  Example  5.14  and  compare  the  resulting  line  L*  with  the  least  squares  line  that 
was  found  there. 


® 
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Chapter  9 
Iteration 


Iteration,  meaning  the  repeated  application  of  a  process  or  function,  appears  in  a  surpris¬ 
ingly  wide  range  of  applications.  Discrete  dynamical  systems,  in  which  the  time  variable 
has  been  “quantized”  into  individual  units  (seconds,  days,  years,  etc.)  are  modeled  by  iter¬ 
ative  systems.  Most  numerical  solution  algorithms,  for  both  linear  and  nonlinear  systems, 
are  based  on  iterative  procedures.  Starting  with  an  initial  guess,  the  successive  iterates 
lead  to  closer  and  closer  approximations  to  the  true  solution.  For  linear  systems  of  equa¬ 
tions,  there  are  several  iterative  solution  algorithms  that  can,  in  favorable  situations,  be 
employed  as  efficient  alternatives  to  Gaussian  Elimination.  Iterative  methods  are  partic¬ 
ularly  effective  for  solving  the  very  large,  sparse  systems  arising  in  the  numerical  solution 
of  both  ordinary  and  partial  differential  equations.  All  practical  methods  for  computing 
eigenvalues  and  eigenvectors  rely  on  some  form  of  iteration.  A  detailed  historical  develop¬ 
ment  of  iterative  methods  for  solving  linear  systems  and  eigenvalue  problems  can  be  found 
in  the  recent  survey  paper  [84].  Probabilistic  iterative  models  known  as  Markov  chains 
govern  basic  stochastic  processes  and  appear  in  genetics,  population  biology,  scheduling, 
internet  search,  financial  markets,  and  many  more. 

In  this  book,  we  will  treat  only  iteration  of  linear  systems.  (Nonlinear  iteration  is 
of  similar  importance  in  applied  mathematics  and  numerical  analysis,  and  we  refer  the 
interested  reader  to  [40,  66,  79]  for  details.)  Linear  iteration  coincides  with  multiplication 
by  successive  powers  of  a  matrix;  convergence  of  the  iterates  depends  on  the  magnitude  of 
its  eigenvalues.  We  present  a  variety  of  convergence  criteria  based  on  the  spectral  radius, 
on  matrix  norms,  and  on  eigenvalue  estimates  provided  by  the  Gershgorin  Theorem. 

We  will  then  turn  our  attention  to  some  classical  iterative  algorithms  that  can  be  used 
to  accurately  approximate  the  solutions  to  linear  algebraic  systems.  The  Jacobi  Method  is 
the  simplest,  while  an  evident  serialization  leads  to  the  Gauss-Seidel  Method.  Completely 
general  convergence  criteria  are  hard  to  formulate,  although  convergence  is  assured  for  the 
important  class  of  strictly  diagonally  dominant  matrices  that  arise  in  many  applications. 
A  simple  modification  of  the  Gauss-Seidel  Method,  known  as  Successive  Over-Relaxation 
(SOR),  can  dramatically  speed  up  the  convergence  rate. 

In  the  following  Section  9.5  we  discuss  some  practical  methods  for  computing  eigenval¬ 
ues  and  eigenvectors  of  matrices.  Needless  to  say,  we  completely  avoid  trying  to  solve  (or 
even  write  down)  the  characteristic  polynomial  equation.  The  basic  Power  Method  and  its 
variants,  which  are  based  on  linear  iteration,  are  used  to  effectively  approximate  selected 
eigenvalues.  To  calculate  the  complete  system  of  eigenvalues  and  eigenvectors,  the  remark¬ 
able  Q  R  algorithm,  which  relies  on  the  Gram-Schmidt  orthogonalization  procedure,  is  the 
method  of  choice,  and  we  include  a  new  proof  of  its  convergence. 

The  following  section  describes  some  more  recent  “semi-direct”  iterative  algorithms  for 
finding  eigenvalues  and  solving  linear  systems,  that,  in  contrast  to  the  classical  iterative 
schemes,  are  guaranteed  to  eventually  produce  the  exact  solution.  These  are  based  on  the 
idea  of  a  Krylov  subspace,  spanned  by  the  vectors  generated  by  repeatedly  multiplying 
an  initial  vector  by  the  coefficient  matrix.  The  Arnoldi  and  Lanczos  algorithms  are  used 
to  find  a  corresponding  orthonormal  basis  for  the  Krylov  subspaces,  and  thereby  approx¬ 
imate  (some  of)  the  eigenvalues  of  the  matrix.  Two  classes  of  solution  methods  are  then 
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presented:  first,  the  Full  Orthogonalization  Method  (FOM)  which,  for  a  positive  definite 
matrix,  produces  the  powerful  technique  known  as  Conjugate  Gradients  (CG),  of  particu¬ 
lar  importance  in  numerical  approximation  of  partial  differential  equations.  The  second  is 
the  recent  Generalized  Minimal  Residual  Method  (GMRES),  which  is  effectively  used  for 
solving  large  sparse  linear  systems. 

The  final  Section  9.7  introduces  the  basic  ideas  behind  wavelets,  a  powerful  and  widely 
used  alternative  to  Fourier  methods  for  signal  and  image  processing.  While  slightly  off 
topic,  it  provides  a  nice  application  of  orthogonality  and  iterative  techniques,  and  is  thus 
a  fitting  end  to  this  chapter. 


9.1  Linear  Iterative  Systems 

We  begin  with  the  basic  definition  of  an  iterative  system  of  linear  equations. 

Definition  9.1.  A  linear  iterative  system  takes  the  form 

U(fc+i)  =  TU(fe),  u(0)=a,  (9.1) 

where  the  coefficient  matrix  T  has  size  n  x  n. 

We  will  consider  both  real  and  complex  systems,  and  so  the  iterates ^  u ^  are  vectors 
either  in  Mn  (which  assumes  that  the  coefficient  matrix  T  is  also  real)  or  in  Cn.  A  linear 
iterative  system  can  be  viewed  as  a  discretized  version  of  a  first  order  system  of  linear 
ordinary  differential  equations,  as  in  (8.9),  in  which  the  state  of  the  system,  as  represented 
by  the  vector  u^k\  changes  at  discrete  time  intervals,  labeled  by  the  index  k. 


Scalar  Systems 

As  usual,  to  study  systems  one  begins  with  an  in-depth  analysis  of  the  scalar  version. 
Consider  the  iterative  equation 

«(fe+1)=A  u(k\  u{0)=a,  (9.2) 

where  A,  a  and  the  solution  utk]  are  all  real  or  complex  scalars.  The  general  solution  to 
(9.2)  is  easily  found: 

=  A  —  X  a,  u ^  —  X  —  A2  a,  u ^  —  X  —  A3  a, 


and,  in  general, 

u^k)=\ka.  (9.3) 

If  the  initial  condition  is  a  =  0,  then  the  solution  u ^  =  0  is  constant.  In  other  words,  0 
is  a  fixed  point  or  equilibrium  solution  for  the  iterative  system  because  it  does  not  change 
under  iteration. 

Example  9.2.  Banks  add  interest  to  a  savings  account  at  discrete  time  intervals.  For 

example,  if  the  bank  offers  5%  interest  compounded  yearly,  this  means  that  the  account 
balance  will  increase  by  5%  each  year.  Thus,  assuming  no  deposits  or  withdrawals,  the 
balance  u ^  after  k  years  will  satisfy  the  iterative  equation  (9.2)  with  A  =  1  +  r,  where 


^  Warning.  The  superscripts  on  u ^  refer  to  the  iterate  number,  and  should  not  be  mistaken 
for  derivatives. 


9.1  Linear  Iterative  Systems 


477 


0  <  A  <  1 


-1  <  A  <  0 


A  =  — 1  1  <  A  A  <  — 1 


Figure  9.1.  One-Dimensional  Real  Linear  Iterative  Systems. 


r  =  .05  is  the  interest  rate,  and  the  1  indicates  that  all  the  money  remains  in  the  account. 
Thus,  after  k  years,  your  account  balance  is 

u(k)  —  (i  _|_  r)ka ,  where  a  —  (9.4) 

is  your  initial  deposit.  For  example,  if  u =  a  =  $  1  ,000,  after  1  year,  your  account  has 
u (-1-)  =  $1,050,  after  10  years  —  $1,628.89,  after  50  years  =  $11,467.40,  and  after 
200  years  r^200)  =  $17,292,580.82,  a  gain  of  over  17,000%. 

When  the  interest  is  compounded  monthly,  the  rate  is  still  quoted  on  a  yearly  basis,  and 
so  you  receive  ^  °f  the  interest  each  month.  If  u  ^  denotes  the  balance  after  k  months, 
then,  after  n  years,  the  account  balance  will  be  u^12n^  —  (l  +  ^r)  12 n  a.  Thus,  when  the 
interest  rate  of  5%  is  compounded  monthly,  your  account  balance  is  u^12^  =  $  1,051.16  after 
1  year,  2A120)  =  $1,647.01  after  10  years,  ff(60C))  =  $12,119.38  after  50  years,  and  u (240°)  = 
$21,573,572.66  dollars  after  200  years.  So,  if  you  wait  sufficiently  long,  compounding  will 
have  a  dramatic  effect.  Similarly,  daily  compounding  replaces  12  by  365.25,  the  number  of 
days  in  a  year.  After  200  years,  the  balance  wil  be  $22,011,396.03. 

Let  us  analyze  the  solutions  of  scalar  iterative  equations,  starting  with  the  case  when 
A  E  R  is  a  real  constant.  Aside  from  the  equilibrium  solution  =  0,  the  iterates  exhibit 
six  qualitatively  different  behaviors,  depending  on  the  size  of  the  coefficient  A. 

(a)  If  A  =  0,  the  solution  immediately  becomes  zero,  and  stays  there,  whereby  =  0 

for  all  k  >  1. 

(b)  If  0  <  A  <  1,  then  the  solution  is  of  one  sign,  and  tends  monotonically  to  zero,  so 

u(k)  0  as  k  oc. 

(c)  If  —  1  <  A  <  0,  then  the  solution  tends  to  zero:  u ^  0  as  k  oc.  Successive 

iterates  have  alternating  signs. 
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a. 


(d)  If  A  =  1,  the  solution  is  constant:  —  a,  for  all  k  >  0. 

(e)  If  A  =  —  1,  the  solution  bounces  back  and  forth  between  two  values;  —  (— 1) 

(/)  if  i  <  A  <  oo,  then  the  iterates  u ^  become  unbounded.  If  a  >  0,  they  tend 

monotonically  to  Too;  if  a  <  0,  they  tend  to  —  oo. 

(g)  if  —  oo  <  A  <  —  1,  then  the  iterates  u ^  also  become  unbounded,  with  alternating 

signs. 

In  Figure  9.1  we  exhibit  representative  scatter  plots  for  the  nontrivial  cases  (b  -  g).  The 
horizontal  axis  indicates  the  index  /c,  and  the  vertical  axis  the  solution  value  u.  Each  dot 
in  the  scatter  plot  represents  an  iterate  u ^ . 

In  the  first  three  cases,  the  fixed  point  u  —  0  is  said  to  be  asymptotically  stable ,  since 
all  solutions  tend  to  0  as  k  oo.  In  cases  (d)  and  (e),  the  zero  solution  is  stable ,  since 


solutions  with  nearby  initial  data,  \a  \  <C  1,  remain  nearby.  In  the  final  two  cases,  the  zero 
solution  is  unstable ;  every  nonzero  initial  data  a  ^  0  —  no  matter  how  small  —  will  give 
rise  to  a  solution  that  eventually  goes  arbitrarily  far  away  from  equilibrium. 

Let  us  also  investigate  complex  scalar  iterative  systems.  The  coefficient  A  and  the  initial 
datum  a  in  (9.2)  are  allowed  to  be  complex  numbers.  The  solution  is  the  same,  (9.3),  but 
now  we  need  to  know  what  happens  when  we  raise  a  complex  number  A  to  a  high  power. 
The  secret  is  to  write  A  =  re10  in  polar  form  (3.93),  where  r  =  |  A  |  is  its  modulus  and 


9  —  ph  A  its  angle  or  phase.  Then  Xk  —  rk  elke 
and  so  the  solutions  (9.3)  have  modulus  |  u ^ 


Since 
A  k  a 


i  k  6 


—  1,  we  have  A 


k 


A 


k 


A  \k  |  a  |.  As  a  result,  will 
remain  bounded  if  and  only  if  |  A  i  <  i,  and  will  tend  to  zero  as  k  oo  if  and  only  if 
A  |  <  1. 

We  have  thus  established  the  basic  stability  criteria  for  scalar,  linear  systems. 

Theorem  9.3.  The  zero  solution  to  a  (real  or  complex)  scalar  iterative  system  is 

(a)  asymptotically  stable  if  and  only  if  |  A  |  <  1, 

(b)  stable  if  and  only  if  |  A  |  <  1, 

(c)  unstable  if  and  only  if  |  A  |  >  1. 


Exercises 

9.1.1.  Suppose  =  1.  Find  iVtiVV  and  when  (a)  =  2u^k\ 

(b)  u{k+1)  =  -  .9u{k\  (c)  M(fe+1)  =  \u{k\  (d)  u{k+1)  =  (l-2i)W(fe). 

Is  the  system  stable  or  unstable?  If  stable,  is  it  asymptotically  stable? 

9.1.2.  A  bank  offers  3.25%  interest  compounded  yearly.  Suppose  you  deposit  $100.  (a)  Set  up 
a  linear  iterative  equation  to  represent  your  bank  balance,  (b)  How  much  money  do  you 
have  after  10  years?  (c)  What  if  the  interest  is  compounded  monthly? 

9.1.3.  Show  that  the  yearly  balances  of  an  account  whose  interest  is  compounded  monthly 
satisfy  a  linear  iterative  system.  How  is  the  effective  yearly  interest  rate  determined  from 
the  original  annual  interest  rate? 

9.1.4.  Show  that,  as  the  time  interval  of  compounding  goes  to  zero,  the  bank  balance  after  k 
years  approaches  an  exponential  function  erk  a,  where  r  is  the  yearly  interest  rate  and  a  is 
the  initial  balance. 

9.1.5.  For  which  values  of  A  does  the  scalar  iterative  system  (9.2)  have  a  periodic  solution, 

meaning  that  =  u ^  for  some  ml 
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9.1.6.  Consider  the  iterative  systems  u =  A and  v where  |  A  |  >  |  /i 
Prove  that,  for  all  nonzero  initial  data  u ^  =  a  /  0,  v ^  =6/0,  the  solution  to  the  first  is 


eventually  larger  (in  modulus)  than  that  of  the  second:  |  u 


(*0 


>  v 


(AO 


,  for  k  ^  0. 


9.1.7.  Let  tfc(t)  denote  the  solution  to  the  linear  ordinary  differential  equation  u  =  (3u, 

u( 0)  =  a.  Let  h  >  0.  Show  that  the  sample  values  u ^  =  u(kh )  satisfy  a  linear  iterative 
system.  What  is  the  coefficient  A?  Compare  the  stability  properties  of  the  differential 
equation  and  the  corresponding  iterative  system. 


4b  9.1.8.  Investigate  the  solutions  of  the  linear  iterative  equation  u =  A  u 
complex  number  with  A  I  =  1,  and  look  for  patterns. 


(AO 


when  A  is  a 


9.1.9.  Let  A,  c  £  M.  Solve  the  affine  or  inhomogeneous  linear  iterative  equation 

n(fc+i)  _  U(Q)  =  a.  (9.5) 

Discuss  the  possible  behaviors  of  the  solutions.  Hint :  Write  the  solution  in  the  form 
u ^  =  u*  +  v^k\  where  u*  is  the  equilibrium  solution. 


9.1.10.  A  bank  offers  5%  interest  compounded  yearly.  Suppose  you  deposit  $120  in  the  account 
each  year.  Set  up  an  affine  iterative  equation  (9.5)  to  represent  your  bank  balance.  How 
much  money  do  you  have  after  10  years?  After  you  retire  in  50  years?  After  200  years? 


9.1.11.  Redo  Exercise  9.1.10  in  the  case  that  the  interest  is  compounded  monthly  and  you 
deposit  $10  each  month. 

C  9.1.12.  Each  spring,  the  deer  in  Minnesota  produce  offspring  at  a  rate  of  roughly  1.2  times  the 
total  population,  while  approximately  5%  of  the  population  dies  as  a  result  of  predators 
and  natural  causes.  In  the  fall,  hunters  are  allowed  to  shoot  3,600  deer.  This  winter  the 
Department  of  Natural  Resources  (DNR)  estimates  that  there  are  20,000  deer.  Set  up  an 
affine  iterative  equation  (9.5)  to  represent  the  deer  population  each  subsequent  year.  Solve 
the  system  and  find  the  population  in  the  next  5  years.  How  many  deer  in  the  long  term 
will  there  be?  Using  this  information,  formulate  a  reasonable  policy  of  how  many  deer 
hunting  licenses  the  DNR  should  allow  each  fall,  assuming  one  kill  per  license. 


Powers  of  Matrices 

The  solution  to  the  general  linear  iterative  system 

u(fc+i)  =  Tu{k\  u(0)  =  a,  (9.6) 

is  also,  at  least  at  first  glance,  immediate.  Clearly, 

u(1)  =  Tu(0)  =  Ta,  u(2)  =  Tu(1)  =  T2  a,  u(3)  =  Tu(2)  =  T3  a, 

and,  in  general, 

u(fc)  =  Tk  a.  (9.7) 

Thus,  the  iterates  are  simply  determined  by  multiplying  the  initial  vector  a  by  the  succes¬ 
sive  powers  of  the  coefficient  matrix  T.  And  so,  in  contrast  to  differential  equations,  proving 
the  existence  and  uniqueness  of  solutions  to  an  iterative  system  is  completely  trivial. 

However,  unlike  real  or  complex  scalars,  the  general  formulas  for  and  qualitative  be¬ 
havior  of  the  powers  of  a  square  matrix  are  not  nearly  so  immediately  apparent.  (Before 
continuing,  the  reader  is  urged  to  experiment  with  simple  2x2  matrices,  trying  to  detect 
patterns.)  To  make  progress,  recall  how,  in  Section  8.1,  we  endeavored  to  solve  linear 
systems  of  differential  equations  by  suitably  adapting  the  known  exponential  solution  from 
the  scalar  version.  In  the  iterative  case,  the  scalar  solution  formula  (9.3)  is  written  in  terms 
of  powers,  not  exponentials.  This  motivates  us  to  try  the  power  ansatz 

u(fe)  =  Afe  v, 


(9.8) 
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in  which  A  is  a  scalar  and  v  a  vector,  as  a  possible  solution  to  the  system.  We  find 

u(fc+i)  =  Afc+iV)  while  Tu(fc)  =  T(Afev)  =  AfeTv. 

These  two  expressions  will  be  equal  if  and  only  if 

Tv  =  A  v. 


This  is  precisely  the  defining  eigenvalue  equation  (8.12),  and  thus,  (9.8)  is  a  nontrivial 
solution  to  (9.6)  if  and  only  if  A  is  an  eigenvalue  of  the  coefficient  matrix  T  and  v  ^  0  an 
associated  eigenvector . 

Thus,  for  each  eigenvector  and  eigenvalue  of  the  coefficient  matrix,  we  can  construct  a 
solution  to  the  iterative  system.  We  can  then  appeal  to  linear  superposition,  as  in  Theo¬ 
rem  7.30,  to  combine  the  basic  eigensolutions  to  form  more  general  solutions.  In  particular, 
if  the  coefficient  matrix  is  complete,  this  method  will  produce  the  general  solution. 


Theorem  9.4.  If  the  coefficient  matrix  T  is  complete,  then  the  general  solution  to  the 
linear  iterative  system  u(fc+1)  =  T  u ^  is  given  by 

U(fe)  =  cx  A^  vx  +  c2  Xk  V2  +  •••  +Cn\kn\n,  (9.9) 

where  v1? . . . ,  vn  are  the  linearly  independent  eigenvectors  and  A1? . . . ,  An  the  correspond¬ 
ing  eigenvalues  of  T.  The  coefficients  c1,...,cn  are  arbitrary  scalars  and  are  uniquely 
prescribed  by  the  initial  conditions  =  a. 


Proof :  Since  we  already  know,  by  linear  superposition,  that  (9.9)  is  a  solution  to  the 
system  for  arbitrary  c1? . . . ,  cn,  it  suffices  to  show  that  we  can  match  any  prescribed  initial 
conditions.  To  this  end,  we  need  to  solve  the  linear  system 

U^°^)=c1v1-t-  •••  +cnvn  =  a.  (9.10) 

Completeness  of  T  implies  that  its  eigenvectors  form  a  basis  of  Cn,  and  hence  (9.10)  always 
admits  a  solution.  In  matrix  form,  we  can  rewrite  (9.10)  as 

S  c  =  a.  so  that  c  —  S'-1  a,  where  S  —  ( v:  v2  . . .  vn  ) 

is  the  (nonsingular)  matrix  whose  columns  are  the  eigenvectors.  Q.E.D. 

Solutions  in  the  incomplete  cases  are  more  complicated  to  write  down,  and  rely  on  the 
Jordan  bases  of  Section  8.6;  see  Exercise  9.1.40. 

Example  9.5.  Consider  the  iterative  system 


x(fc+l)  _  |  x(k)  1  y(k)  ^ 

7yO+l)  —  i  rO)  _L  3  Xk) 

(9.11) 

with  initial  conditions 

=  a, 

y ^  =  b. 

(9.12) 

The  system  can  be  rewritten  in  our  matrix  form  (9.6),  with 


Solving  the  characteristic  equation 

det(T  —  AI)  =  A2  -  1.2A-  .32  =  0 
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Figure  9.2.  Stable  Iterative  System. 


produces  the  eigenvalues  A:  =  .8,  A2  =  .4.  We  then  solve  the  associated  linear  systems 
(T  —  A  ■  I)v-  =  0  for  the  corresponding  eigenvectors: 


Therefore,  the  basic  eigensolutions  are 

uf  =  ( .8)*  (  \  ) , 


Theorem  9.4  tells  us  that  the  general  solution  is  given  as  a  linear  combination, 


U(fc)  =  Cx  u[k)  +  c2  U(2k)  =  cx  ( ,8)k  (  +  c2  ( .4)' 

where  c11c2  are  determined  by  the  initial  conditions 


~l\  _  /rc1(.8)*-c2(.4)* 

c1(-8)fe  +  c2(.4)fe  h 


1 


u(°)  = 


Cl  +  Cr 


a 

b 


and  hence 


ci  = 


a  +  b 


a 


c2  = 


Therefore,  the  explicit  formula  for  the  solution  to  the  initial  value  problem  (9.11-12)  is 


(k)  s  <^\k  ^  F  b  /  A\k  O'  —  ^  (k)  s  n\k  ®  b  /  A\k  ^  ^ 

x(k)  =  (  __  +  (  ^k  __  y{k)  =  (  _8)fe  __  +  (  ^  __ 

In  particular,  as  k  oc,  the  iterates  u ^  0  converge  to  zero  at  a  rate  governed  by  the 

dominant  eigenvalue  A:  =  .8.  Figure  9.2  illustrates  the  cumulative  effect  of  the  iteration; 
the  initial  data  is  colored  orange,  and  successive  iterates  are  colored  green,  blue,  purple, 
red.  The  initial  conditions  consist  of  a  large  number  of  points  on  the  unit  circle  x1 2  -\-y2  =  1, 
which  are  successively  mapped  to  points  on  progressively  smaller  and  flatter  ellipses,  that 
shrink  down  towards  the  origin. 


Example  9.6. 

equation 


The  Fibonacci  numbers  are  defined  by  the  second  order'*'  scalar  iterative 


u 


(fc+2)  _  u(k+ 1)  +  u(k) 


(9.13) 


1  In  general,  an  iterative  system  u(|,,:  1  Jl  =  1  *  +  •  ■  ■  +  T  in  which  the  new  iterate 

depends  upon  the  preceding  j  values  is  said  to  have  order  j. 
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with  initial  conditions 

=  a,  u ^  =  b.  (9.14) 

In  short,  to  obtain  the  next  Fibonacci  number,  add  the  previous  two.  The  classical 
Fibonacci  integers  start  with  a  —  0,  b  —  1;  the  next  few  are 

u ^  =  0,  =  1,  u ^  =  1,  =  2,  =  3,  =  5,  u ^  =  8,  u ^  =  13,  .... 

The  Fibonacci  integers  occur  in  a  surprising  variety  of  natural  objects,  including  leaves, 
flowers,  and  fruit,  [83].  They  were  originally  introduced  by  the  eleventh-/twelfth-century 
Italian  mathematician  Leonardo  Pisano  Fibonacci  as  a  crude  model  of  the  growth  of  a 
population  of  rabbits.  In  Fibonacci’s  model,  the  kth  Fibonacci  number  measures  the 
total  number  of  pairs  of  rabbits  at  year  k.  We  start  the  process  with  a  single  juvenile  pair^ 
at  year  0.  Once  a  year,  each  pair  of  rabbits  produces  a  new  pair  of  offspring,  but  it  takes 
a  full  year  for  a  rabbit  pair  to  mature  enough  to  produce  offspring  of  their  own. 

Every  higher  order  iterative  equation  can  be  replaced  by  an  equivalent  first  order  iter¬ 
ative  system.  In  this  particular  case,  we  define  the  vector 

7,(0 

^(fc+1) 


u(fe)  = 


€  R2, 


and  note  that  (9.13)  is  equivalent  to  the  matrix  system 

=  l)(j*(+i))>  or  u(fe+1)  =Tu(fe),  where  T  — 


0  1 
1  1 


To  find  the  explicit  formula  for  the  Fibonacci  numbers,  we  must  determine  the  eigenvalues 
and  eigenvectors  of  the  coefficient  matrix  T.  A  straightforward  computation  produces 


Ai  = 


1  +  C5 


vi  = 


=  1.618034 

-1  +  V5 
2 

1 


A2  — 


1  -  +5 


v2  = 


=  -.618034 

-1-05 
2 

1 


Therefore,  according  to  (9.9),  the  general  solution  to  the  Fibonacci  system  is 


= 


u 


u «  \  _ 

O+i)  )  - c 


1  +  C5 


k 


1 


-1  +  05 
2 

1 


T  Cr 


1  -  + 


k 


1-05 

2 

1 


(9.15) 


The  initial  data 


u<°>  -  c 


1 


-l  +  p5 
2 

1 


uniquely  specifies  the  coefficients 

2a  +  (1  +  +5  )b 


T  Cc 


1-05 

2 

1 


C1  = 


C2  = 


2a+(l-  V5)b 

2  y/E  ’  “2  2^5 

The  first  entry  of  the  solution  vector  (9.15)  produces  the  explicit  formula 


k 


{k)  _  (— 1  +  \/5) a  +  2b  /  1  +  \/5  \  (1  +  y+q  —  2b  1+5 

2  y/E  \  2  j  +  2  VE  \  2 


(9.16) 


Fibonacci  ignores  some  pertinent  details  like  the  sex  of  the  offspring. 
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Figure  9.3.  Fibonacci  Iteration. 


for  the  kth  Fibonacci  number.  For  the  particular  initial  conditions  a  =  0,  b 
reduces  to  the  classical  Binet  formula 


u 


(*0 


1 

7! 


1,  (9.16) 

(9.17) 


It  is  a  remarkable  fact  that,  for  every  value  of  k ,  all  the  cancel  out,  and  the  Binet 
formula  (9.17)  does  indeed  produce  the  Fibonacci  integers  listed  above.  Another  useful 
observation  is  that,  since 


0  < 


a/5  —  1 
2 


<  1  <  X1 


1  -h  a/5 


the  terms  involving  A^  go  to  oo  (and  so  the  zero  solution  to  this  iterative  system  is  unstable) 
while  the  terms  involving  A2  go  to  zero.  Therefore,  even  for  k  moderately  large,  the  first 
term  in  (9.16)  is  an  excellent  approximation  to  the  kth  Fibonacci  number  —  and  one  that 
gets  more  and  more  accurate  as  k  increases.  A  plot  of  the  first  4  iterates,  starting  with  the 
initial  data  consisting  of  equally  spaced  points  on  the  unit  circle,  appears  in  Figure  9.3.  As 
in  the  previous  example,  the  circle  is  mapped  to  a  sequence  of  progressively  more  eccentric 
ellipses;  however,  their  major  semi- axes  become  more  and  more  stretched  out,  and  almost 
all  points  end  up  going  off  to  00  in  the  direction  of  the  dominant  eigenvector  v2. 

The  dominant  eigenvalue  A:  =  |(  1  +  y/5  )  =  1.6180339  . . .  is  known  as  the  golden  ratio 
and  plays  an  important  role  in  spiral  growth  in  nature,  as  well  as  in  art,  architecture,  and 
design,  [83].  It  describes  the  overall  growth  rate  of  the  Fibonacci  integers,  and,  in  fact, 
every  sequence  of  Fibonacci  numbers  with  initial  conditions  b  ^  |(  1  —  y/5  )  a. 

-3  1  6\ 

1  —1  —2  be  the  coefficient  matrix  for  a  three-dimensional 

-1-1  0/ 

iterative  system  u^fc+1^  =  T  u^k\  Its  eigenvalues  and  corresponding  eigenvectors  are 


Example  9.7.  Let  T  = 


A 


1 


v 


1 
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Therefore,  according  to  (9.9),  the  general  complex  solution  is 


u(fc)  =  b1  (—2) 


k 


2  ]  +  b2  (—  1  +  i ) 


k 


+  _  0 


k 


where  61?  62,  &3  are  arbitrary  complex  scalars. 

If  we  are  interested  only  in  real  solutions,  we  can  break  up  any  complex  solution  into  its 
real  and  imaginary  parts,  each  of  which  constitutes  a  real  solution.  (This  is  a  manifestation 
of  the  general  Reality  Principle  of  Theorem  7.48,  but  is  not  hard  to  prove  directly.)  We 
begin  by  writing  A2  =  —  1  +  i  =  \/2  e37r1/4  in  polar  form,  and  hence 

(-1+  i)k  =  2fc/2e3fc7ri/4  =  2fc/2  (cosf/c7r+  i  sinf/cTr)  . 

Therefore,  the  complex  solution 


(— 1+0 


k 


^k/2 


(  2  cos  |  k  7r  +  sin  |  k  it  \ 


V 


o 

COS  jk  71- 
COS  |  kn 


+  i  2k/2 


(  2  sin  |  k  7T  —  cos  |  k  tt  \ 


/ 


V 


sin  |  /c7r 
sin  |  kn 


is  a  combination  of  two  independent  real  solutions.  The  complex  conjugate  eigenvalue 
A3  =  —  1  —  i  leads,  as  before,  to  the  complex  conjugate  solution  —  and  the  same  two 
real  solutions.  The  general  real  solution  u ^  to  the  system  can  be  written  as  a  linear 
combination  of  the  three  independent  real  solutions: 

/  2  cos  |  /c7r-bsin  |  k 7r  \  /  2  sin  |  k 7r  — cos  |  kn  \ 

+  c3  2fc/2 


q(-2)fc  [  -2  ]  +c22fc/2 


Q 

COS  jk  7T 

Q 

COS  f  K7T 


/ 


Q 

sin  |  kiv 

Q 

sin  |  kn 


,  (9.18) 


/ 


where  c^c^Cg  are  arbitrary  real  scalars,  uniquely  prescribed  by  the  initial  conditions. 


Diagonalization  and  Iteration 

An  alternative,  equally  efficient  approach  to  solving  iterative  systems  is  based  on  diagonal¬ 
ization  of  the  coefficient  matrix,  cf.  (8.30).  Specifically,  assuming  the  coefficient  matrix  T 
is  complete,  we  can  factor  it  as  a  product 

T  —  S  A  S’-1,  (9.19) 

in  which  A  =  diag  (Al5  A2, . . . ,  An)  is  the  diagonal  matrix  containing  the  eigenvalues  of  T, 
while  the  columns  of  S  =  (  v ,  •  •  •  vn )  are  the  corresponding  eigenvectors.  Consequently, 
the  powers  of  T  are  given  by 

T2  =  (S  A  S’-1)  (S  A  5_1)  -  S' A2  S~\ 

T3  =  (S  A  S’-1)  (S  A  5_1)  (5A  S'-1)  =  5A3  S~\ 

and,  in  general, 

Tk  =  S  Ak  S~l .  (9.20) 

Moreover,  since  A  is  a  diagonal  matrix,  its  powers  are  trivial  to  compute: 

Afc  =  diag(A^,  ...  ,A^).  (9.21) 

Thus,  by  combining  (9.20-21),  we  obtain  an  explicit  formula  for  the  powers  of  a  complete 
matrix  T.  Furthermore,  the  solution  to  the  associated  linear  iterative  system 

u(fc+i)  =  Tu{k\  u(0)  =  a,  is  given  by  u(fe)  =  Tk a  =  S  Ak  S^1  a.  (9.22) 
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You  should  convince  yourself  that  this  gives  precisely  the  same  solution  as  before.  Compu¬ 
tationally,  there  is  not  a  significant  difference  between  the  two  solution  methods,  and  the 
choice  is  left  to  the  discretion  of  the  user. 

Suppose  T  = 

=  -2.  vi  =  (  3),  A2  =  r  v2  =  (  !)• 


Example  9.8. 

computed: 


Its  eigenvalues  and  eigenvectors  are  readily 


We  assemble  these  into  the  diagonal  eigenvalue  matrix  A  and  the  eigenvector  matrix  S', 
given  by 

#  —  /  ii\ 

5  = 


A  = 


-2  0 

0  1 


-2  -1 
3  1 


whence 


7 

9 


6 

8 


=  T  =  SAS~ 1  = 


-2  -1 
3  1 


-2  0 

0  1 


1 

3 


1 

■2 


as  you  can  readily  check.  Therefore,  according  to  (9.20), 

Tk  —  S  Ak  S"1 

-2  -l\  /(-2)fe  0\(  1  l\  _  /  3  —  2 ( 

0  lJl-3  —2  )  ~  l  -3  +  3( 


■2)k 

■2)k 


2  -2(-2)k 
—2  +  3  (— 2)fe 


You  may  wish  to  check  this  formula  directly  for  the  first  few  values  of  k  =  1, 2, ... .  As  a 
result,  the  solution  to  the  particular  iterative  system 


u++i) 


is 


f  5  —  4(— 2)fe\ 

(  — 5  +  6  (—2)kJ  • 


In  this  case,  the  eigenvalue  X1  =  —2  causes  an  instability,  with  solutions  having  arbitrarily 
large  norm  as  k  — >  oo. 


Exercises 


9.1.13.  Find  the  explicit  formula  for  the  solution  to  the  following  linear  iterative  systems: 

/  \  (fc+1)  (fc)  0  (fc)  (fc+1)  o  (fc)  i  (fc) 

(a)  uK  ’  =  uy  ’  —  2vK  ’ ,  v v  ’  =  —  2uy  ’  +  vy  ’ 

2,  v 

,(0)  _ 


u<°>  =  i,  „(0)  =  o. 


(b)  u'  1  '  =  u 


(fe+i)  =  u(k) 
(c)  w(fe+1)  --(fe) 


(0)  _ 


u 

.(0)  _ 


2  (fc)  (fc+1)  1  (fc)  1  (fc) 

vVy  ,  V v  y  =  ^Uy  ’  —  qVk  ’ 

uy  /  —  v^k\  =  —u^  +  5v^k\  uyKJJ  =  1,  v 

(d)  u{k+1)  =  \u{k)  +  a(fc),  v(fc+1)  =  a(fc)  -2w{k) 

u 

(  \  (fc+1)  (fc)  I  O  (fc)  (fc)  (fc+1)  (fc)  |  <7  (fc)  /I  (fc) 

(e)  uy  ’  =  —  av  y  +  2av  ’  —  wy  ,  vy  ’  =  —  oir  '  +  7ir  7  —  4icv  ' 


(0)  =  3. 

0. 


W 
.(0)  _ 


3™ 


(fc) 


=  1,  V(°>  =  -1,  =  1. 


' ,  tt”  1  7  =  —  6uy 

(fc+1)  n  (fc)  |  (fc)  a  (fc) 
wy  J  =  —  6ir  y  +ofv  '  —  4ie  ' 


,(0)  _ 


ux  '  =  0,  =  1,  =  3. 


,(0)  _ 


9.1.14.  Find  the  explicit  formula  for  the  general  solution  to  the  linear  iterative  systems  with 
the  following  coefficient  matrices: 


(a) 


-1 

1 


2 

1 


( b ) 


-2 

-1 


7 

3 


(c) 


(  —3 

2 

—2  \ 

/ 

5 

6 

1 

3 

+  ) 

-6 

4 

-3 

,  (rf) 

0 

1 

2 

1 

3 

V  12 

-6 

-S) 

V 

1 

-1 

2  J 

3  7 

9.1.15.  Prove  that  all  the  Fibonacci  integers  u^k\  k  >  0,  can  be  found  by  just  computing  the 
first  term  in  the  Binet  formula  (9.17)  and  then  rounding  off  to  the  nearest  integer. 
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1  -  VE' 


k 


9.1.16.  The  kth  Lucas  number  is  defined  as  L ^  ^  ^ 

(a)  Explain  why  the  Lucas  numbers  satisfy  the  Fibonacci  iterative  equation 

^(fc+2)  _  ^(fc+l)  ^(k) '  ^  down  the  first  7  Lucas  numbers. 

(c)  Prove  that  every  Lucas  number  is  a  positive  integer. 

9.1.17.  What  happens  to  the  Fibonacci  integers  if  we  go  “backward  in  time 
k  <  0?  How  is  u related  to  u^l 

9.1.18.  Use  formula  (9.20)  to  compute  the  /cth  power  of  the  following  matrices: 


i.e.. 


for 


(a) 


5 

2 


2 

2 


( b ) 


4 

-2 


1 

1 


(c) 


1 

-1 


1 

1 


(d) 


(1 

1 

2\ 

(  o 

1 

1 

2 

1 

,  (e) 

0 

0 

2 

1 

i  J 

Cl 

0 

9.1.19.  Use  your  answer  from  Exercise  9.1.18  to  solve  the  following  iterative  systems: 

(a)  u(k+1)  =  5u(k)  +  2v(k\  v{k+1)  =  2u{k)  +  2v{k) ,  w(0)  =  -1. 

(b)  u(fe+1)  =  4 


( k )  ,  0  (k) 

UK  J  +  2vK  ’ 

(k)  ,  (k) 

Uy  ’  +  VK  '  , 


(fc+1)  o  (k)  |  0  (k) 
,  vy  J  =  2uK  ’  +  2vK  ’ 

(fc+1)  0  (k)  .  (k) 

vK  ’  =  —  2uK  ’  +  vK  ’ , 


u 

=  1, 


v(0)  =  0, 
t>(0)  =  -3, 


(fe+!)  =u(fc)  +„(*) 


(c)  u 

(d)  u(fe+1)  =  w(fe)  Wfe)+2w(fe) 


(fc+1)  (fc)  ,  (fc) 

(fc+1)  (fc) 

J  =  UK  J 


u(0)  =  0, 


V 


(0)  _ 


=  2. 


.  0  (fc)  ,  (fc) 

+  2vK  ’  +  wy  ’ 


(fc+l)  0  (fc)  |  (fc)  |  (k) 

wy  ’  =  2uy  ’  +  vy  ’  +  wy  ’ 


(e)  u 


(k+ 1)  =  y{k) 


O+l)  (fc)  (fc+i) 
tr  '  =  Wy  \  Wy  J  = 


(fc)  .  0  (k) 

id  ’  +  2wy  J 


u 


ti<°>  =  1. 

(0)  _  i 


(0)  n  (0) 
tr  ’  =  0,  wy  J  = 


V 


(0)  _ 


0,  w 


(0)  _ 


0. 


9.1.20.  (a)  Given  initial  data  =  ( 1, 1, 1  )T  ,  explain  why  the  resulting  solution  u ^  to  the 

system  in  Example  9.7  has  all  integer  entries,  (b)  Find  the  coefficients  <^,02,03  in  the 
explicit  solution  formula  (9.18).  (c)  Check  the  first  few  iterates  to  convince  yourself  that 

the  solution  formula  does,  in  spite  of  appearances,  always  give  an  integer  value. 

9.1.21.  (a)  Show  how  to  convert  the  higher  order  linear  iterative  equation 

u(k+j)  =CiU(k+j-l)  +C2U(k+j-2)  +  ...  +CjuW 

into  a  first  order  system  u ^  =  T  u^).  Hint :  See  Example  9.6. 

(b)  Write  down  initial  conditions  that  guarantee  a  unique  solution  u ^  for  all  k  >  0. 


9.1.22.  Apply  the  method  of  Exercise  9.1.21  to  solve  the  following  iterative  equations: 


(a) 

(b) 

(c) 

(d) 

(e) 

(f) 


M(fe+2)  =  _u(fe+l) 


+  2u(fe) 

12u(fe+2)  =  M(fe+1)  +u(fe) 


u 


(fe+2)  =4u(fe+1)  +M(fe), 

(fe+2)  =  2M(fe+1)  -  2w(fe) 


u 


«<°>  =  1, 

y°)  =  -i, 
(0)  =  x 
(0)  _ 


U 


(1)  _ 


u 


Us  '  =  1, 

-  2u(fe), 

(fe+3)  =  M(fe+2)  +  2u(fe+1)  -  2u(fe), 


=  2. 

tt(1)  =  2. 

w  =  -l. 
(1)  _ 


M(fe+3)  =  2u(fe+2)  +M(fe+1) 


u 


=  3. 

IT'7  =  0, 

u(0)  =  0, 


u 

(0)  _ 


u 


u 


(1) 

(1) 


=  2, 

=  1, 


w(2)  =  3. 

=  1. 


9.1.23.  Suppose  you  have  n  dollars  and  can  buy  coffee  for  $1,  milk  for  $2,  and  orange  juice  for 

$2.  Let  C ^  count  the  number  of  different  ways  of  spending  all  your  money,  (a)  Explain 
why  C ^  +  2 C ^  =  1.  (b)  Find  an  explicit  formula  for  C^n\ 

9.1.24.  Find  the  general  solution  to  the  iterative  system  =  u^}\  +  ui+ii  i  =  1,  ■  ■  ■ 

where  we  set  u ^  =  0  for  all  k.  Hint :  Use  Exercise  8.2.47. 

4»  9.1.25.  Starting  with  u ^  =  0,  =  0,  u ^  =  1,  define  the  sequence  of  tribonacci  numbers 

u ^  by  adding  the  previous  three  to  get  the  next  one.  For  instance, 

u =  u +  u ^  =  1.  (a)  Write  out  the  next  four  tribonacci  numbers,  (b)  Find 

a  third  order  iterative  equation  for  the  tribonacci  number,  (c)  Explain  why  the 
tribonacci  numbers  are  all  integers,  (d)  Find  an  explicit  formula  for  the  solution,  using 
a  computer  to  approximate  the  eigenvalues,  (e)  Do  they  grow  faster  than  the  usual 
Fibonacci  numbers?  What  is  their  overall  rate  of  growth? 
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X  9.1.26.  Suppose  that  Fibonacci’s  rabbits  live  for  only  eight  years,  [44],  (a)  Write  out  an 

iterative  equation  to  describe  the  rabbit  population,  (b)  Write  down  the  first  few  terms, 
(c)  Convert  your  equation  into  a  first  order  iterative  system,  using  the  method  of  Exercise 
9.1.21.  (d)  At  what  rate  does  the  rabbit  population  grow? 

4»  9.1.27.  A  well-known  method  of  generating  a  sequence  of  “pseudo-random”  integers 

u^\  u^\  . . .  satisfying  0  <  <  n  is  based  on  the  modular  Fibonacci  equation 

n(fc+2)  _  u(k+l)  _|_  u(k)  moc[n5  with  suitably  chosen  initial  values  0  <  u^\u^  <  n. 

(a)  Generate  the  sequence  of  pseudo-random  numbers  that  result  from  the  choices  n  =  10, 

=  3?  u ^  =  7.  Keep  iterating  until  the  sequence  starts  repeating. 

(b)  Experiment  with  other  sequences  of  pseudo-random  numbers  generated  by  the  method. 

9.1.28.  Prove  that  the  curves  Ek  =  {  Tfcx  |  ||  x  ||  =  1 },  k  =  0, 1,  2, . . .  ,  sketched  in  Figure  9.2 
form  a  family  of  ellipses  with  the  same  principal  axes.  What  are  the  individual  semi- axes? 
Hint :  Use  Exercise  8.7.23. 


4b  9.1.29.  Plot  the  ellipses  Ek  =  {Tfcx 


x 


Then  determine  their  principal  axes, 


=  1  }  for  k  =  1,  2,  3,  4  for  the  following  matrices  T. 
semi-axes,  and  areas.  Hint :  Use  Exercise  8.7.23. 


9.1.30.  Let  T  be  a  positive  definite  2x2  matrix.  Let  En  =  {  Tnx 


x 


=  1  },  n  =  0, 1,2. 


be  the  image  of  the  unit  circle  under  the  nth  power  of  T.  (a)  Prove  that  En  is  an  ellipse. 
True  or  false:  (b)  The  ellipses  En  all  have  the  same  principal  axes,  (c)  The  semi-axes  are 
given  by  rn  =  rq  ,  sn  =  .sq  .  (d)  The  areas  are  given  by  An  =  tt  an  where  a  =  A1/ tv. 


9.1.31.  Answer  Exercise  9.1.30  when  T  is  an  arbitrary  nonsingular  2x2  matrix. 

Hint :  Use  Exercise  8.7.23. 

9.1.32.  Given  the  general  solution  (9.9)  of  the  iterative  system  u^T1)  =  Tu^,  write  down  the 

solution  to  v^1)  =  aT  +  (3  ,  where  a,  (3  £  R. 


0  9.1.33.  Prove  directly  that  if  the  coefficient  matrix  of  a  linear  iterative  system  is  real,  both  the 
real  and  imaginary  parts  of  a  complex  solution  are  real  solutions. 

0  9.1.34.  Explain  why  the  solution  u^),  k  >  0,  to  the  initial  value  problem  (9.6)  exists  and  is 
uniquely  defined.  Does  this  hold  if  we  allow  negative  k  <  0? 


9.1.35.  Prove  that  if  T  is  a  symmetric  matrix,  then  the  coefficients  in  (9.9)  are  given  by  the 

m  rri 

formula  c  •  =  a  v  •  /  v  ■  v  • . 

9.1.36.  Explain  why  the  column  of  the  matrix  power  Tk  satisfies  the  linear  iterative 

system  =  T with  initial  data  =  e^,  the  standard  basis  vector. 


3 


3 


3 


9.1.37.  Let  =  A be  a  complex  scalar  iterative  equation  with  A  =  /i  +  in.  Show  that 

its  real  and  imaginary  parts  x ^  =  Re  z^k\  y ^  =  Im  z^k\  satisfy  a  two-dimensional  real 
linear  iterative  system.  Use  the  eigenvalue  method  to  solve  the  real  2x2  system,  and  verify 
that  your  solution  coincides  with  the  solution  to  the  original  complex  equation. 


0  9.1.38.  Suppose  V  C  Mn  is  an  invariant  subspace  for  the  n  x  n  matrix  T  governing  the  linear 
iterative  system  =  T  u^).  Prove  that  if  £  V,  then  so  is  the  solution:  u ^  £  V. 

9.1.39.  Suppose  u ^  and  u ^  are  two  solutions  to  the  same  iterative  system  =  Tu^. 

(a)  Suppose  u =  u^k°^  for  some  b0  >  0.  Can  you  conclude  that  these  are  the  same 
solution:  u ^  =  u ^  for  all  k?  (b)  What  can  you  say  if  u^k°^  =  u (kl^  for  ^  k^3 
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0  9.1.40.  Let  T  be  an  incomplete  matrix,  and  suppose  wl5 . . . ,  w  -  is  a  Jordan  chain  associated 
with  an  incomplete  eigenvalue  A.  (a)  Prove  that,  for  i  =  1, . . .  ,  j, 

TkWi  =  \kwi  +  k\k^1wi_1+  (^\  \k^2wi_2+  •••  .  (9.23) 

(b)  Explain  how  to  use  a  Jordan  basis  of  T  to  construct  the  general  solution  to  the  linear 
iterative  system  =  T  u^). 


9.1.41.  Use  the  method  Exercise  9.1.40  to  find  the  general  real  solution  to  the  following  linear 
iterative  systems: 

(a)  u{k+l)  =  2u{k)  +3v(-k\  v{k+l)  =  2  v{k) , 

(b)  u{k+1)  =u(k)  +v{k),  v{k+1)  =-4u(k)  +5v(k), 

(c)  v} 


,(k+l)  =  u(k)  +  v(k) 

(fe+1)  =  -u{k)  Wfe)  +W{k) 


(d)  w(fe+1)  =  3w(fe)  -  w(fe),  w(fc+1)  =  -u(fc)  +  3«(fe)  + 


(fc+1)  (fc)  |  (fc) 

vy  1  =  —  1  +  Wy  1  , 

(*0 


re 


(fc+i)  (fc) 

itr  7  =  —  ar  1 

(fc+i)  (k) 

wy  ’  =  —  m  y 


(e)  u 

(f)  u 


(k+D  =  u(k)  _  v(k)  _  wW 


V 


(fe+1)  =  2u(fe)+2v(fe)+2'u;(fe) 


(/c+1) 

kt  '  =  —a 


+  3  a; 

(fc) 


(fc) 


i  (&)  |  (k) 

+  tr  '  +iev  ' 


0+1)  _  v(fc)  _|_  ^(fc) 


(fc+i)  (fc)  ,  (fc) 

'  =  — id  '  -f -  wK  J 


(k+ 1)  (fc) 

ar  J  =  zK  J 


(k+ 1)  (fc) 

zK  =  —  rw  ’ 


9.1.42.  Find  a  formula  for  the  /cth  power  of  a  Jordan  block  matrix.  Hint :  Use  Exercise  9.1.40. 


U  9.1.43.  An  affine  iterative  system  has  the  form  =  T u ^  +  b,  =  c. 

(a)  Under  what  conditions  does  the  system  have  an  equilibrium  solution  u ^  =  u*? 

(b)  In  such  cases,  find  a  formula  for  the  general  solution.  Hint :  Look  at  —  u*. 

(c)  Solve  the  following  affine  iterative  systems: 


f  —3  2  —2  \ 

f  o 

(  1\ 

(Hi)  u^+1)  = 

-6  4  -3 

u(fe)  +  [  _3 

,  U<°>  = 

0 

^  12  -6  -5 ) 

V  o ) 

1-1/ 

/  5  i  i  \ 

(  1  \ 

(  1  \ 

6  3  6 

6 

6 

( iv )  = 

0  ~h  s 

(k)  | 
uv  '  + 

1 

3 

,  u<°>  = 

2 

3 

l  i-i  V 

v  -\) 

V  V 

(d)  Discuss  what  happens  in  cases  in  which  there  is  no  fixed  point,  assuming  that 
T  is  complete. 


9.2  Stability 

With  the  solution  formula  (9.9)  in  hand,  we  are  now  in  a  position  to  understand  the 
qualitative  behavior  of  solutions  to  (complete)  linear  iterative  systems.  The  most  important 
case  for  applications  is  when  all  the  iterates  converge  to  0. 

Definition  9.9.  The  equilibrium  solution  u*  =  0  to  a  linear  iterative  system  (9.1)  is  called 
globally  asymptotically  stable  if  all  solutions  u ^  0  as  k  oc. 

Asymptotic  stability  relies  on  the  following  property  of  the  coefficient  matrix. 

Definition  9.10.  A  matrix  T  is  called  convergent  if  its  powers  converge  to  the  zero  matrix, 
Tk  -H  O,  meaning  that  the  individual  entries  of  Tk  all  go  to  0  as  k  -H  oo. 

The  equivalence  of  the  convergence  condition  and  stability  of  the  iterative  system  follows 
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immediately  from  the  solution  formula  (9.7) 


Theorem  9.11.  The  linear  iterative  system  =  Tu^  has  globally  asymptotically 

stable  zero  solution  if  and  only  if  T  is  a  convergent  matrix. 


Proof :  If  Tk  — O,  and  u ^  =  Tk  a  is  any  solution,  then  clearly  u ^  0  as  k  oo, 

proving  stability.  Conversely,  the  solution  u ^  —  Tke-  is  the  same  as  the  jth  column  of 

T  .  If  the  origin  is  asymptotically  stable,  then  u)  0.  Thus,  the  individual  columns  of 
Tk  all  tend  to  0,  proving  that  Tk  — )►  O.  Q.E.D. 

To  facilitate  the  analysis  of  convergence,  we  shall  adopt  a  norm  ||  •  ||  on  our  underlying 
vector  space,  Mn  or  Cn.  The  reader  may  be  inclined  to  choose  the  Euclidean  (or  Hermitian) 
norm,  but,  in  practice,  the  oo  norm 


u 


(X) 


=  max 


{ 


u 


1  b 


u 


n 


} 


(9.24) 


prescribed  by  the  vector’s  maximal  entry  (in  modulus)  is  often  easier  to  work  with.  Con¬ 
vergence  of  the  iterates  is  equivalent  to  convergence  of  their  norms: 


U(C  o  if  and  only  if 


u 


(*9 


■»  0 


as 


k  oo. 


The  fundamental  stability  criterion  for  linear  iterative  systems  relies  on  the  size  of  the 
eigenvalues  of  the  coefficient  matrix. 

Theorem  9.12.  The  matrix  T  is  convergent,  and  hence  the  zero  solution  of  the  associated 
linear  iterative  system  (9.1)  is  globally  asymptotically  stable,  if  and  only  if  all  its  (complex) 
eigenvalues  have  modulus  strictly  less  than  one:  |  A  ■  |  <  1. 

Proof :  Let  us  prove  this  result  assuming  that  the  coefficient  matrix  T  is  complete.  (The 
proof  in  the  incomplete  case  relies  on  the  Jordan  canonical  form,  and  is  outlined  in  Exercise 
9.2.18.)  If  AJ  is  an  eigenvalue  such  that  |  A?-  |  <  1,  then  the  corresponding  basis  solution 


u 


O)  _  \k 


3 


3 


—  A  ■  tends  to  zero  as  k  oo;  indeed, 


u 


(k) 


J 


\k  V  • 

3  3 


k 


V 


*  0, 


since 


<  1 


Therefore,  if  all  eigenvalues  are  less  than  1  in  modulus,  all  terms  in  the  solution  formula 
(9.9)  tend  to  zero,  which  proves  asymptotic  stability:  u ^  0.  Conversely,  if  any  eigen¬ 

value  satisfies  |  A  -  |  >  1,  then  the  solution  =  Xk  v;  does  not  tend  to  0  as  k  oo,  and 
hence  0  is  not  asymptotically  stable.  Q.E.D. 


Spectral  Radius 

Consequently,  the  necessary  and  sufficient  condition  for  asymptotic  stability  of  a  linear 
iterative  system  is  that  all  the  eigenvalues  of  the  coefficient  matrix  he  strictly  inside  the 
unit  circle  in  the  complex  plane:  |  A  •  |  <  1.  This  criterion  can  be  recast  using  the  following 
important  definition. 

Definition  9.13.  The  spectral  radius  of  a  matrix  T  is  defined  as  the  maximal  modulus  of 


all  of  its  real  and  complex  eigenvalues:  p(T)  —  max  {  |  A: 


^ k  I  } 


Theorem  9.14.  The  matrix  T  is  convergent  if  and  only  if  its  spectral  radius  is  strictly 
less  than  one:  p(T)  <  1. 
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If  T  is  complete,  then  we  can  apply  the  triangle  inequality  to  (9.9)  to  estimate 


u 


(*0 


Cl  Af  V!  + 


-I  c  \k  v 

L'n  /xn  v  n 


<  I  A, 


\k 


<  p(T)k(  |  <4  |  ||  vi  II  +  •  •  •  + 


Cl  Vl 


+ 


+  A 


n 


k 


n 


C  V 

n  n 


n 


(9.25) 


)  =  C  p(T)k, 


for  some  constant  C  >  0  that  depends  only  upon  the  initial  conditions.  In  particular,  if 
p(T)  <  1,  then 


u 


(AO 


<  C  p{T)k  — >  0 


as 


k  oc, 


(9.26) 


in  accordance  with  Theorem  9.14.  Thus,  the  spectral  radius  p(T)  prescribes  the  rate  of 
convergence  of  the  solutions  to  equilibrium;  the  smaller  the  spectral  radius,  the  faster  the 
solutions  go  to  0. 

If  T  has  only  one  largest  (simple)  eigenvalue,  so  |  Ax  |  >  |  A  •  |  for  all  j  >  1,  then  the 

first  term  in  the  solution  formula  (9.9)  will  eventually  dominate  all  the  others:  ||  Xk  v:  || 

|  A jVj  ||  for  j  >  1  and  k  0.  Therefore,  provided  that  cx  ^  0,  the  solution  (9.9)  has  the 
asymptotic  formula 


u 


(AO 


i  (9.27) 

and  so  most  solutions  end  up  parallel  to  vx.  In  particular,  if  |  Ax  |  =  p(T)  <  1,  such  a 
solution  approaches  0  along  the  direction  of  the  dominant  eigenvector  v:  at  a  rate  governed 
by  the  modulus  of  the  dominant  eigenvalue.  The  exceptional  solutions,  with  cx  =  0,  tend 
to  0  at  a  faster  rate,  along  one  of  the  other  eigendirections.  In  practical  computations, 
one  rarely  observes  the  exceptional  solutions.  Indeed,  even  if  the  initial  condition  does 
not  involve  the  dominant  eigenvector,  numerical  errors  during  the  iteration  will  almost 
inevitably  introduce  a  small  component  in  the  direction  of  v1?  which  will,  if  you  wait  long 
enough,  eventually  dominate  the  solution. 

The  inequality  (9.25)  applies  only  to  complete  matrices.  In  the  general  case,  one  can 
prove,  cf.  Exercise  9.2.18,  that  the  solution  satisfies  the  slightly  weaker  inequality 


u 


(AO 


<  Co 


k 


for  all 


k  >  0,  where  a  >  P(T) 


(9.28) 


is  any  number  larger  than  the  spectral  radius,  while  C  >  0  is  a  positive  constant  (whose 
value  may  depend  on  how  close  a  is  to  p). 


Example  9.15.  According  to  Example  9.7,  the  matrix 


T  — 


has  eigenvalues 


Ai  =  —2, 

A2  =  —  1  +  i , 

A3  =  - 1  -  i  - 


Since  |  Ax  |  =  2  >  |  A2  |  =  |  A3  |  =  y/2  ,  the  spectral  radius  is  p{T)  =  |  Ax  |  =  2.  We  conclude 
that  T  is  not  a  convergent  matrix.  As  the  reader  can  check,  either  directly,  or  from  the 
solution  formula  (9.18),  the  vectors  u ^  —  Tku^  obtained  by  repeatedly  multiplying  any 
nonzero  initial  vector  u(°)  by  T  rapidly  go  off  to  oo,  in  successively  opposite  directions,  at 
a  rate  roughly  equal  to  p(T)k  =  2k . 

On  the  other  hand,  the  matrix 


/  1 


T  = 


-  T 

3 


1 

3 

1 

3 


1 

3 

1 

3 

1 

3 


2  \ 


2 

3 


\  _  2 
A1  —  3 

with  eigenvalues  A2  =  h 


2 

3  ’ 

i  _  A  i 

3  3  ’ 


0/ 


\  _  1  I  1  i 

Ao  o  \  0^9 
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has  spectral  radius  p(T)  —  |,  and  hence  is  a  convergent  matrix.  According  to  (9.27),  if  we 
write  the  initial  data  =  cx  v:  +  c2  v2  +  c3  v3  as  a  linear  combination  of  the  eigenvectors, 
then,  provided  cx  ^  0,  the  iterates  have  the  asymptotic  form  u ^  «  cx  (|)fc  v1?  where 

Vjl  =  (4, —2,1)  is  the  eigenvector  corresponding  to  the  dominant  eigenvalue  \x  —  3. 
Thus,  for  most  initial  vectors,  the  iterates  end  up  decreasing  in  length  by  a  factor  of  almost 
exactly  |,  eventually  becoming  parallel  to  the  dominant  eigenvector  vr  This  is  borne  out 

by  a  sample  computation:  starting  with  =  (1,1,1),  we  obtain,  for  instance, 

/ -.018216 \  / -.012126 \  /-.  008096  \ 

u(15)  =  .009135  ,  u(16)  =  .006072  ,  u(17)  =  .004048  , 

\ -.004567/  \ -.003027/  \ -.002018/ 

which  form  progressively  more  accurate  scalar  multiples  of  the  dominant  eigenvector  *v1  = 
(4,  —  2, 1  )T;  moreover,  the  ratios  between  their  successive  entries,  u\k+1^/u\k\  are  ap- 

r\ 

proaching  the  dominant  eigenvalue  X1  = 


Exercises 


9.2.1.  Determine  the  spectral  radius  of  the  following  matrices: 

0  1  0\  /- 1 


(a) 


1 

3 


2 

4 


( b ) 


(c) 


( d ) 


0  0  1 
V  —2  1  2 

9.2.2.  Determine  whether  or  not  the  following  matrices  are  convergent: 


V 


5 

4  0 
4  -4 


—9  \ 
-1 
3  / 


(a) 


2 

3 


3 

2 


(b) 


.6 

.3 


.3 

.7 


(c)  - 
v  ;  5 


5 

-3 

-2\ 

( .8 

.3 

.2  \ 

1 

-2 

1  > 

(d) 

.1 

.2 

.6 

\1 

-5 

4/ 

V-1 

.5 

•2/ 

9.2.3.  Which  of  the  listed  coefficient  matrices  defines  a  linear  iterative  system  with 
asymptotically  stable  zero  solution? 


(a) 


-3  0 

-4  -1 


(*>) 


(c) 


1 

2 
1 
2 


( d ) 


/ - 1 
-1 
V  o 


(e) 


/ 


V 


1  1 

2  4 

1  3 

2  4 

1  _  1 

4  4 


4 

1 

2 

1 


\ 

I—1 2 * 4 

O 

do 

to 

^  3  0  -1\ 

1  1  9  3 

,  (f) 

0  1  0  ,  (g) 

2  2  z  2 

1  n  3  2 

l  2  0  0/ 

6^2  3 

/ 

\  / 

1  0  -3  -§/ 

9.2.4.  (a)  Determine  the  eigenvalues  and  spectral  radius  of  the  matrix  T  = 

/ 


/  3 
-2 
0 


(b)  Use  part  (a)  to  find  the  eigenvalues  and  spectral  radius  of  T  = 


\ 

3 

5 

2 

5 


2 

1 

2 


2 

5 

1 

5 


—2  \ 
0 

1/ 


2  \ 
5 

0 


V 


o  I 


1 

5 


/ 


(c)  Write  down  an  asymptotic  formula  for  the  solutions  to  =  T  u^). 

9.2.5.  (a)  Show  that  the  spectral  radius  of  7  =  |  J  ^  is  p(T)  =  1. 

(b)  Show  that  most  iterates  =  Tku^  become  unbounded  as  k  — >  oo. 

(c)  Discuss  why  the  inequality  ||  <  C p(T)k  does  not  hold  when  the  coefficient 

matrix  is  incomplete,  (d)  Can  you  prove  that  (9.28)  holds  in  this  example? 
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9.2.6.  Given  a  linear  iterative  system  with  non-convergent  matrix,  which  solutions,  if  any, 
will  converge  to  0? 

0  9.2.7.  Suppose  T  is  a  complete  matrix,  (a)  Prove  that  every  solution  to  the  corresponding 
linear  iterative  system  is  bounded  if  and  only  if  p(T)  <  1.  (b)  Can  you  generalize  this 
result  to  incomplete  matrices?  Hint :  Look  at  Exercise  9.1.40. 


C  9.2.8.  Discuss  the  asymptotic  behavior  of  solutions  to  an  iterative  system  that  has  two 
eigenvalues  of  largest  modulus,  e.g.,  A1  =  —  A2,  or  A1  =  A2  are  complex  conjugate 
eigenvalues.  How  would  you  detect  this?  How  can  you  determine  the  eigenvalues  and 
eigenvectors? 

9.2.9.  Suppose  T  has  spectral  radius  p(T).  Can  you  predict  the  spectral  radius  of  aT  +  61, 
where  a,  6  are  scalars?  If  not,  what  additional  information  do  you  need? 

9.2.10.  Prove  that  if  A  is  any  square  matrix,  then  there  exists  c  /  0  such  that  the  scalar 
multiple  cd  is  a  convergent  matrix.  Find  a  formula  for  the  largest  possible  such  c. 


C  9.2.11.  Let  Mn  be  the  n  x  n  tridiagonal  matrix  with  all  l’s  on  the  sub-  and  super-diagonals, 
and  zeros  on  the  main  diagonal,  (a)  What  is  the  spectral  radius  of  Mn?  Hint :  Use 
Exercise  8.2.47.  (b)  Is  Mn  convergent?  (c)  Find  the  general  solution  to  the  iterative 

system  =  Mnw(yk\ 

C  9.2.12.  Let  a,/?  be  scalars.  Let  Ta  ^  be  the  n  x  n  tridiagonal  matrix  that  has  all  a’s  on  the 
sub-  and  super-diagonals,  and  /Ts  on  the  main  diagonal,  (a)  Solve  the  iterative  system 
u++i)  _  (b)  For  which  values  of  a,/?  is  the  system  asymptotically  stable? 

Hint :  Combine  Exercises  9.2.11  and  9.1.32. 


9.2.13.  (a)  Prove  that  if  |  detT  |  >  1,  then  the  iterative  system  =  T u is  unstable, 

(b)  If  |  detT  |  <  1,  is  the  system  asymptotically  stable?  Prove  or  give  a  counterexample. 


9.2.14.  True  or  false:  (a)  p(cA)  =  cp(A ),  (b)  p(S  1AS)  =  p(A),  (c)  p(A2)  =  p(A)2, 
(d)  p(A~x)  =  l/p(A),  (e)  p{A  +  B)  =  p{A)  +  p(B) ,  (f)  p(AB)  =  p{A)  p(B). 

9.2.15.  True  or  false:  (a)  If  T  is  convergent,  then  T2  is  convergent. 

rri 

(b)  If  A  is  convergent,  then  T  =  A  A  is  convergent. 


9.2.16.  Suppose  Tk  — )►  P  as  k  — >  oo.  (a)  Prove  that  P  is  idempotent:  P 2  =  P. 

(b)  Can  you  characterize  all  such  matrices  P? 

(c)  What  are  the  conditions  on  the  matrix  A  for  this  to  happen? 

9.2.17.  Prove  that  a  matrix  T  with  all  integer  entries  is  convergent  if  and  only  if  it  is  nilpotent, 
i.e.,  Tk  =  O  for  some  k  >  0.  Give  a  nonzero  example  of  such  a  matrix. 


0  9.2.18.  Prove  the  inequality  (9.28)  when  T  is  incomplete.  Use  it  to  complete  the  proof  of 
Theorem  9.14  in  the  incomplete  case.  Hint:  Use  Exercises  9.1.40,  9.2.22. 


0  9.2.19.  Suppose  that  M  is  a  nonsingular  matrix,  (a)  Prove  that  the  implicit  iterative  system 

M  u^n+1)  =  has  globally  asymptotically  stable  zero  solution  if  and  only  if  all 

the  eigenvalues  of  M  are  strictly  greater  than  one  in  magnitude:  |  p i  \  >  1.  (b)  Let 
K  be  another  matrix.  Prove  that  more  general  implicit  iterative  system  of  the  form 

Mu^n+1)  =  K has  globally  asymptotically  stable  zero  solution  if  and  only  if  all  the 
generalized  eigenvalues  of  the  matrix  pair  K ,  M,  as  in  Exercise  8.5.8,  are  strictly  less  than  1 
in  magnitude:  |  A^  |  <  1. 

0  9.2.20.  The  stable  subspace  S  C  Mn  for  a  linear  iterative  system  =  T u ^  is  defined 

as  the  set  of  all  points  a  such  that  the  solution  with  initial  condition  =  a  satisfies 
u(k)  — y  Oasb  — >  oo.  (a)  Prove  that  S  is  an  invariant  subspace  for  the  matrix  T. 

(b)  Determine  necessary  and  sufficient  conditions  for  a  £  S. 

(c)  Find  the  stable  subspace  for  the  linear  systems  in  Exercise  9.1.14 
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T  9.2.21.  Consider  a  second  order  iterative  system  +  5u^,  where  A,  B 

are  n  x  n  matrices.  Define  a  quadratic  eigenvalue  to  be  a  complex  number  that  satisfies 
det(A  I  —  A  A  —  B)  =0.  Prove  that  the  zero  solution  is  globally  asymptotically  stable  if 
and  only  if  all  its  quadratic  eigenvalues  satisfy  |  A  |  <  1. 

0  9.2.22.  Let  p(t)  be  a  polynomial.  Assume  0  <  A  <  /a.  Prove  that  there  is  a  positive  constant  C 
such  that  p(n)  A n  <  C  fin  for  all  n  >  0. 


Fixed  Points 


The  zero  vector  0  is  always  a  fixed  point  for  a  linear  iterative  system  =  Tu^, 

since  0  =  TO,  and  so  u ^  =  0  is  an  equilibrium  solution.  Are  there  any  others?  The 
answer  is  immediate:  u*  is  a  fixed  point  if  and  only  if  u*  —  T u*,  and  hence  u*  satisfies  the 
eigenvalue  equation  for  T  with  for  the  unit  eigenvalue  A  =  1.  Thus,  the  system  admits  a 
nonzero  fixed  point  if  and  only  if  the  coefficient  matrix  T  has  1  as  an  eigenvalue.  Since  every 
nonzero  scalar  multiple  of  the  eigenvector  u*  is  also  an  eigenvector,  in  such  cases  the  system 
has  infinitely  many  fixed  points,  namely  all  elements  of  the  eigenspace  V1  =  ker(T  —  I), 
including  0.  We  are  interested  in  whether  the  fixed  points  are  stable  in  the  sense  that 
solutions  having  nearby  initial  conditions  remain  nearby.  More  precisely: 


Definition  9.16.  A  fixed  point  u*  of  an  iterative  system  = 

for  every  e  >  0  there  exists  a  5  >  0  such  that  whenever  ||  u^0)  —  u* 
iterates  satisfy  ||  u ^  —  u*  ||  <  e  for  all  k. 


T  u is  called  stable  if 
<  S,  then  the  resulting 


The  stability  of  the  fixed  points,  at  least  if  the  coefficient  matrix  is  complete,  is  governed 
by  the  same  solution  formula  (9.9).  If  the  eigenvalue  Ax  =  1  is  simple,  and  all  other 
eigenvalues  are  less  than  one  in  modulus,  so 


1  —  A1  >  |  A2 


> 


•  >  A 


n 


then  the  solution  takes  the  asymptotic  form 


u 


(*0 


=  c1v1+c2A^v2  + 


I  c  \fi  v 
un  /xn  v  n 


A  ClVl, 


as 


k  — >  oo, 


(9.29) 


converging  to  one  of  the  fixed  points,  i.e.,  to  a  multiple  of  the  eigenvector  vx.  The  coefficient 
c1  is  prescribed  by  the  initial  conditions,  cf.  (9.10).  The  rate  of  convergence  of  the  solution 
is  governed  by  the  modulus  |  A2  |  of  the  subdominant  eigenvalue. 


Proposition  9.17.  Suppose  that  T  has  a  simple  (or,  more  generally,  complete)  eigenvalue 
X1  —  1,  and,  moreover,  all  other  eigenvalues  satisfy  |  A  •  |  <  1.  Then  all  solutions  to  the 

linear  iterative  system  u(fc+1)  =  T u ^  converge  to  a  vector  v  £  V1  that  lies  in  the  A:  =  1 
eigenspace.  Moreover,  all  the  fixed  points  v  £  Vx  of  T  are  stable. 

Stability  of  a  fixed  point  does  not  imply  asymptotic  stability,  since  nearby  solutions 
may  converge  to  a  nearby  fixed  point,  i.e.,  a  slightly  different  element  of  the  eigenspace  V1. 

The  general  necessary  and  sufficient  conditions  for  stability  of  the  fixed  points  of  a  linear 
iterative  system  is  governed  by  the  spectral  radius  of  its  coefficient  matrix,  as  follows.  The 
proof  is  relegated  to  Exercise  9.2.28. 


Theorem  9.18.  The  fixed  points  of  an  iterative  system  are  stable  if  and 

only  if  p(T)  <  1  and,  moreover,  every  eigenvalue  of  modulus  |  A  |  =  1  is  complete. 
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Thus,  with  regard  to  linear  iterative  systems,  either  all  fixed  points  are  stable  or  all 
are  unstable.  Keep  in  mind  that  the  fixed  points  are  the  elements  of  the  eigenspace  V1 
corresponding  to  the  eigenvalue  A  =  1,  if  such  exists.  If  1  is  not  an  eigenvalue  of  T,  then 
u*  =  0  is  the  only  fixed  point. 

Example  9.19.  Consider  the  iterative  system  with  coefficient  matrix 


-3\ 

1 

0/ 


The  eigenvalues  and  corresponding  eigenvectors  are 


\  =  1, 


v 


l 


i 


Since  A ,  =  1,  every  scalar  multiple  of  the  eigenvector  v1  is  a  fixed  point.  The  fixed  points 
are  stable,  since  the  remaining  eigenvalues  have  modulus  |  A2  |  =  |  A3  |  =  b  \/2  ss  .7071  <  1. 

Thus,  the  iterates  u(,':)  =  Tk a  — >  c ,  v,  will  eventually  converge  to  a  multiple  of  the  first 


eigenvector;  in  almost  all  cases  the  convergence  rate  is  4  y/2 .  For  example,  starting  with 


u,0)  =  ( 1, 1, 1  )T,  leads  to  the  iterates^ 


u 


(5) 


/ -7.9062  \ 
u(10)  =  3.9062  , 

\ -1.9062  J 


(  - 7.9766  \ 
u<15)  =  f  4.0  j , 


u 


(20) 


/ -8.0088  \ 
4.0029  , 

\ -2.0029  ) 


(  - 7.9985  \ 
3.9993  , 

\ -1.9993  / 


/ -8.0001  \ 
4.0001  , 

\ -2.0001 ) 


T 

which  are  gradually  converging  to  the  particular  eigenvector  (  —  8,4,  —  2  )  =  —  2v1.  This 

can  be  predicted  in  advance  by  decomposing  the  initial  vector  into  a  linear  combination  of 
the  eigenvectors: 


u 


(0) 


3  —  3  i 


2+  i 
-1 
1 


•> 


whence 


and  so  u ^  ( —8,  4,  —2  )T  as  k  — >•  oc.  Despite  the  complex  formula,  the  solution  is,  in 

fact,  real. 


Since  the  convergence  is  slow,  we  only  display  every  fifth  one. 


9.2  Stability 
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Exercises 


9.2.23. 


Find  all  fixed  points  for  the  iterative  systems  with  the  following  coefficient  matrices: 


( -1 

-1 

-4\ 

(  2 

1  -1\ 

(c) 

-2 

0 

-4 

,  ( d ) 

2 

3  -2 

^  1 

-1 

oj 

^-1 

-1  2  / 

9.2.24.  Discuss  the  stability  of  each  fixed  point  and  the  asymptotic  behavior(s)  of  the  solutions 
to  the  systems  in  Exercise  9.2.23.  Which  fixed  point,  if  any,  does  the  solution  with  initial 
condition  =  e1  converge  to? 


9.2.25.  Suppose  T  is  a  symmetric  matrix  that  satisfies  the  hypotheses  of  Proposition  9.17  with 
a  simple  eigenvalue  X1  =  1.  Prove  that  the  solution  u ^  to  the  linear  iterative  system 

,(o) 


u(fc+i)  _  u(fc)  pas  limiting  value  lim  u = 

k  — >•  oo 


u 


1- 


9.2.26.  True  or  false:  If  T  has  a  stable  nonzero  fixed  point,  then  it  is  a  convergent  matrix. 

9.2.27.  True  or  false:  If  every  point  u  £  Rn  is  a  fixed  point,  then  they  are  all  stable.  Can  you 
characterize  such  systems? 


0  9.2.28.  Prove  Theorem  9.18:  (a)  assuming  T  is  complete,  (b)  for  general  T. 

Hint:  Use  Exercise  9.1.40. 

T  9.2.29.  (a)  Under  what  conditions  does  the  linear  iterative  system  u  (fc+l)  _  rp  u(fc)  pave 

a 

period,  2  solution,  meaning  that  the  iterates  repeat  after  every  other  iterate:  = 

u(fc)  ^  u(fc+i)? 

Give  an  example  of  such  a  system,  (b)  Under  what  conditions  is  there  a 
unique  period  2  solution?  (c)  What  about  a  period  m  solution  for  2  <  m  £  N? 


Matrix  Norms  and  Convergence 

As  we  now  know,  the  convergence  of  a  linear  iterative  system  is  governed  by  the  spectral 
radius,  or,  equivalently,  the  modulus  of  the  largest  eigenvalue  of  the  coefficient  matrix. 
Unfortunately,  finding  accurate  approximations  to  the  eigenvalues  of  most  matrices  is  a 
nontrivial  computational  task.  Indeed,  as  we  will  learn  in  Section  9.5,  all  practical  nu¬ 
merical  algorithms  rely  on  some  form  of  iteration.  But  using  iteration  to  determine  the 
spectral  radius  defeats  the  purpose,  which  is  to  predict  the  behavior  of  the  iterative  system 
in  advance!  One  independent  means  of  accomplishing  this  is  through  matrix  norms,  as 
introduced  at  the  end  of  Section  3.3. 


Let 


denote  a  norm  on^  Mn.  Theorem  3.20  defines  the  induced  natural  matrix  norm 

=  1  }.  The  following 


u 


on  the  space  of  n  x  n  matrices,  denoted  by  ||  A  ||  =  max{  ||  Au 
result  relates  the  magnitude  of  the  norm  of  a  matrix  to  convergence  of  the  associated 
iterative  system. 

Proposition  9.20.  If  A  is  a  square  matrix,  then  ||  Ak  ||  <  ||  A  ||fc.  In  particular,  if  ||  A  ||  <  1, 
then  ||  Ak  ||  -T  0  as  k  -T  oo,  and  hence  A  is  a  convergent  matrix:  Ak  -T  O. 

The  first  part  is  a  restatement  of  Proposition  3.22,  and  the  second  part  is  an  immediate 
consequence.  The  converse  to  this  result  is  not  quite  true;  a  convergent  matrix  does  not 


^  We  work  with  real  iterative  systems  throughout  this  chapter,  but  the  methods  readily  extend 
to  their  complex  counterparts. 
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necessarily  have  matrix  norm  less  than  1,  or  even  <1  —  see  Example  9.23  below.  An 
alternative  proof  of  Proposition  9.20  can  be  based  on  the  following  useful  estimate: 


Theorem  9.21.  The  spectral  radius  of  a  matrix  is  bounded  by  its  matrix  norm: 

p{A)  <  ||  a 


(9.30) 


Proof :  If  A  is  a  real  eigenvalue,  and  u  a  corresponding  unit  eigenvector,  so  that  Au  =  Au 
with  ||  u  ||  —  1,  then 


Au 


A  u  =  I  A 


u  = 


A 


(9.31) 


Since  ||  A||  is  the  maximum  of  ||  Au||  over  all  possible  unit  vectors,  this  implies  that 

A  |  <  ||  A  ||.  (9.32) 

If  all  the  eigenvalues  of  A  are  real,  then  the  spectral  radius  is  the  maximum  of  their  absolute 
values,  and  so  it  too  is  bounded  by  ||  A||,  proving  (9.30). 

If  A  has  complex  eigenvalues,  then  we  need  to  work  a  little  harder  to  establish  (9.32). 
(This  is  because  the  matrix  norm  is  defined  by  the  effect  of  A  on  real  vectors,  and  so 
we  cannot  directly  use  the  complex  eigenvectors  to  establish  the  required  bound.)  Let 
A  =  re1 6  be  a  complex  eigenvalue  with  complex  eigenvector  z  =  x  +  iy.  Define 


/i  —  min  {  ||  Re  (e 1  ^  z)  ||  =  ||  (cos p)  x  —  (sin (/?)  y  ||  |  0  <  p  <  2ty  } 


(9.33) 


Since  the  indicated  subset  is  a  closed  curve  (in  fact,  an  ellipse)  that  does  not  go  through 
the  origin’*',  /i  >  0.  Let  p0  denote  the  value  of  the  angle  that  produces  the  minimum,  so 


li  =  ||  (cos<p0)x-  (sin p0)y 

Define  the  real  unit  vector 

Re  (e1(^°z)  (cos p0)  x  —  (sin p0)  y 


Re  ( e 1  z ) 


u  = 


Then 


li 


li 


Au  =  —  Re  ( e1(po  Az)  =  —  Re  (elLf>0re 

/i  li 


i  6 


so  that 


r 


u 


=  1, 


)  =  -  Re  (el(^z). 

7  /I  v  7 


Therefore,  keeping  in  mind  that  m  is  the  minimal  value  in  (9.33), 


A  >  I  Au 


r 


li 


Re  ( 


i  ((po+6) 


)  ||  >  r  =  |  A 


and  so  (9.32)  also  holds  for  complex  eigenvalues. 


(9.34) 

Q.E.D. 


Let  us  see  what  the  convergence  criterion  of  Proposition  9.20  says  for  a  couple  of  our 
well-known  matrix  norms.  First,  the  formula  (3.44)  for  the  oc  norm  implies  the  following 
convergence  criterion. 

Proposition  9.22.  If  all  the  absolute  row  sums  of  A  are  strictly  less  than  1,  then 
||  A  ||  <  1  and  hence  A  is  a  convergent  matrix. 


Example  9.23.  Consider  the  symmetric  matrix  A  = 


row  sums  are 


1 

2 


+ 


l 

3 


A 


5 

6  ’ 

oo 


1 

2 

1 

3 


Its  two  absolute 


-  I  I  +  I  i  I  =  T2  >  80 

=  max  {  §,  j2  }  =  |  =  -83333 . . .  . 


This  relies  on  the  fact  that  x,  y  are  linearly  independent,  which  was  shown  in  Exercise  8.3.12. 
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Since  the  norm  is  less  than  1,  A  is  a  convergent  matrix.  Indeed,  its  eigenvalues  are 


A-l  =  9+7^  =  .731000...  , 


A2  — 


9  -  V73 


=  .018999. . .  , 


24  z  24 

and  hence  the  spectral  radius  is  p(A)  =  X1  =  .731000 . . . ,  which  is  slightly  smaller  than  its 
oc  norm. 

The  row  sum  test  for  convergence  is  not  always  conclusive.  For  example,  the  matrix 


A  = 


1 

2 
3 
5 


has  matrix  norm 


A 


=  ii  >  l 

OO  10  ^ 


On  the  other  hand,  its  eigenvalues  are  — —  ^  and  hence  its  spectral  radius  is 

15  — j—  1 

p(A)  =  - — -  =  .987882 . . .  ,  which  implies  that  A  is  (just  barely)  convergent,  even 

though  its  maximal  row  sum  is  larger  than  1. 

Similarly,  using  the  formula  (8.61)  for  the  Euclidean  matrix  norm,  one  deduces  a  con¬ 
vergence  criterion  based  on  the  magnitude  of  the  singular  values. 

Proposition  9.24.  If  A  is  a  square  matrix  whose  largest  singular  value  satisfies  cr1  <  1, 
then  ||  A  || 2  <  1  and  hence  A  is  a  convergent  matrix. 


Example  9.25.  Consider  the  matrix  and  associated  Gram  matrix 


A  = 


(  0 

i 

4 

V  2 

\  K 


1 

3  3 


o  \ 
1 

5 


ata  = 


.2225  .0800  .1250  \ 

.0800  .1511  -Till  . 

.1250  -Till  .3611 J 


5  5^7 

Then  ATA  has  eigenvalues  A:  =  .4472,  A2  =  .2665,  A3  =  .0210,  and  hence  the  singular 
values  of  A  are  their  square  roots:  a1  =  .6687,  a2  =  .5163,  a3  =  .1448.  The  Euclidean 
matrix  norm  of  A  is  the  largest  singular  value,  and  so  ||T||2  =  .6687,  proving  that  A  is 
a  convergent  matrix.  Note  that,  as  always,  the  matrix  norm  overestimates  the  spectral 
radius,  which  is  p(A)  =  .5. 

Unfortunately,  as  we  discovered  in  Example  9.23,  matrix  norms  are  not  a  foolproof  test 
of  convergence.  There  exist  convergent  matrices  such  that  p(A)  <  1  that  yet  have  matrix 
norm  ||T||  >  1.  In  such  cases,  the  matrix  norm  is  not  able  to  predict  convergence  of  the 
iterative  system,  although  one  should  expect  the  convergence  to  be  quite  slow.  Although 
such  pathology  might  show  up  in  the  chosen  matrix  norm,  it  turns  out  that  one  can  always 
rig  up  some  matrix  norm  for  which  ||T||  <  1.  This  follows  from  a  more  general  result, 
whose  proof  can  be  found  in  [62  . 

Theorem  9.26.  Let  A  have  spectral  radius  p(A).  If  e  >  0  is  any  positive  number,  then 


there  exists  a  matrix  norm 


such  that 


p(T)  E:  A  p(T)  T  s. 


(9.35) 


Corollary  9.27.  If  A  is  a  convergent  matrix,  then  there  exists  a  matrix  norm  such  that 

Mil  <  i- 

Proof :  By  definition,  A  is  convergent  if  and  only  if  p(A)  <  1.  Choose  e  >  0  such  that 
p(A)  +  e  <  1.  Any  norm  that  then  satisfies  (9.35)  has  the  desired  property.  Q.E.D. 


498 


9  Iteration 


It  can  also  be  proved,  [48],  that,  given  a  matrix  norm,  lim  ||Hn||1/n  =  p(A),  and 

n  — >  oo 

hence,  if  A  is  convergent,  then  ||  An  ||  <  1  for  n  sufficiently  large. 

Warning.  Based  on  the  accumulated  evidence,  one  might  be  tempted  to  speculate  that 
the  spectral  radius  itself  defines  a  matrix  norm.  Unfortunately,  this  is  not  the  case.  For 

example,  the  nonzero  matrix  *4  —  ^  jj  J 
of  a  basic  norm  axiom. 


has  zero  spectral  radius,  p(A)  =  0,  in  violation 


Exercises 


9.2.30.  Compute  the  oo  matrix  norm  of  the  following  matrices.  Which  are  guaranteed  to  be 


convergent?  (a) 


(e) 


/  2 
7 


2 

7 


o  f 


V  7 


4 

7 


_4  \ 
7 
6 
7 

1  ) 


(0 


-  (b) 

(  0  .1 

-.1  0 

\  —.8  -.1 


5 
3 
7 

6 


(C) 


( 


(g) 


1  -A 


v  h 


2 
7 
2 
7 

_  2  _  2  \ 
3  3 

A  —1 

I  °/ 


(d) 


(h) 


/ 


V 


1 
4 
1 

2 

3  0 

-  3  0  3 


0 


\ 


0  2  1  / 

u  3  3  / 


9.2.31.  Compute  the  Euclidean  matrix  norm  of  each  matrix  in  Exercise  9.2.30.  Have  your 
convergence  conclusions  changed? 

9.2.32.  Compute  the  spectral  radii  of  the  matrices  in  Exercise  9.2.30.  Which  are  convergent? 
Compare  your  conclusions  with  those  of  Exercises  9.2.30  and  9.2.31. 

(  k  —  1  \ 

9.2.33.  Let  k  be  an  integer  and  set  Ak  =  (  ^  j  •  Compute  (a)  ||  Ak  H^,  (b)  ||  Ak  ||2, 

(c)  p(Ak).  (d)  Explain  why  every  is  a  convergent  matrix,  even  though  their  matrix 
norms  can  be  arbitrarily  large,  (e)  Why  does  this  not  contradict  Corollary  9.27? 

9.2.34.  Show  that  if  |  c  |  <  1/ 1|  A  ||,  then  cd  is  a  convergent  matrix. 

0  9.2.35.  Prove  that  the  spectral  radius  function  does  not  satisfy  the  triangle  inequality  by 
finding  matrices  A,  B  such  that  p(A  +  B)  >  p(A)  +  p(B). 

9.2.36.  Find  a  convergent  matrix  that  has  dominant  singular  value  a1  >  1. 

0  9.2.37.  Prove  that  if  A  is  a  real  symmetric  matrix,  then  its  Euclidean  matrix  norm  is  equal  to 
its  spectral  radius. 

0  9.2.38.  Let  A  be  a  square  matrix.  Let  s  =  max{s1? . . . ,  sn}  be  the  maximal  absolute  row  sum 
of  A  and  let  t  =  minj  |  cl  a  \  ~  C  },  with  ri  given  by  (8.27).  Prove  that  max{  0,0  <  p{A)  < 


s. 


a 


13 


=  a  .  Can  you  bound  its  radius  of 


9.2.39.  Suppose  the  largest  entry  (in  modulus)  of  A  is 
convergence? 

9.2.40.  (a)  Suppose  that  every  entry  of  the  n  x  n  matrix  A  is  bounded  by  |  a-  \  <  1/n .  Prove 

that  A  is  a  convergent  matrix.  Hint :  Use  Exercise  9.2.38.  (b)  Produce  a  matrix  of  size 

n  x  n  with  one  or  more  entries  satisfying  |  a-  |  =  1/n  that  is  not  convergent. 

9.2.41.  Write  down  an  example  of  a  strictly  diagonally  dominant  matrix  that  is  also 
convergent. 

9.2.42.  True  or  false:  If  B  =  S~1AS  are  similar  matrices,  then 


(a)  ||  B 


oo  II  ^  II oo ’ 


(*>)  \\B 


A\\2,  (c)  p(B)  =  p(A). 


9.2.43.  Prove  that  the  curve  parametrized  in  (9.33)  is  an  ellipse.  What  are  its  semi-axes? 


9.3  Markov  Processes 
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0  9.2.44.  (a)  Prove  that  the  individual  entries  a  -  •  of  a  matrix  A  are  bounded  in  absolute  value 

V  /  LJ 


OO 


by  its  oo  matrix  norm: 


a 


13 


<  ||  A  ||  .  ( b )  Prove  that  if  the  series  E  ||  An  |  loo  <  00 


(X) 


n  =  0 


converges,  then  the  matrix  series  E  An  =  A *  converges  to  some  matrix  A* . 


n  —  0 


oo 


(c)  Let  ||  A  ||  denote  any  natural  matrix  norm.  Prove  that  if  the  series  E  ||  An  ||  <  oo 


oo 


n  =  0 


converges,  then  the  matrix  series  E  An  =  A  converges. 


n  =  0 


oo 


9.2.45.  (a)  Use  Exercise  9.2.44  to  prove  that  the  geometric  matrix  series  E  An  converges 


whenever  p(A)  <  1.  Hint :  Apply  Corollary  9.27. 


n  =  0 


(b)  Prove  that  the  sum  equals  (I  —  A)  1.  How  do  you  know  I  —  A  is  invertible? 


9.3  Markov  Processes 

A  discrete  probabilistic  process  in  which  the  future  state  of  a  system  depends  only  upon 
its  current  configuration  is  known  as  a  Markov  chain ,  to  honor  the  pioneering  early  twen¬ 
tieth  studies  of  the  Russian  mathematician  Andrei  Markov.  Markov  chains  are  described 
by  linear  iterative  systems  whose  coefficient  matrices  have  a  special  form.  They  define  the 
simplest  examples  of  stochastic  processes,  [4,23],  which  have  many  profound  physical,  bio¬ 
logical,  economic,  and  statistical  applications,  including  networks,  internet  search  engines, 
speech  recognition,  and  routing. 

To  take  a  very  simple  (albeit  slightly  artificial)  example,  suppose  you  would  like  to  be 
able  to  predict  the  weather  in  your  city.  Consulting  local  weather  records  over  the  past 
decade,  you  determine  that 

(a)  If  today  is  sunny,  there  is  a  70%  chance  that  tomorrow  will  also  be  sunny, 

(b)  But,  if  today  is  cloudy,  the  chances  are  80%  that  tomorrow  will  also  be  cloudy. 

Question:  given  that  today  is  sunny,  what  is  the  probability  that  next  Saturday’s  weather 
will  also  be  sunny? 

To  formulate  this  process  mathematically,  we  let  s ^  denote  the  probability  that  day 
k  is  sunny  and  c ^  the  probability  that  it  is  cloudy.  If  we  assume  that  these  are  the  only 
possibilities,  then  the  individual  probabilities  must  sum  to  1,  so 

5(C  +c(C  =  i 

According  to  our  data,  the  probability  that  the  next  day  is  sunny  or  cloudy  is  expressed 
by  the  equations 


g(fe+i)  _  s(k)  _|_  2  c(fc);  c(fe+1)  =  ,3s(fc)  +  .8c(fc).  (9.36) 

Indeed,  day  k  +  1  could  be  sunny  either  if  day  k  was,  with  a  70%  chance,  or,  if  day  k  was 
cloudy,  there  is  still  a  20%  chance  of  day  k  +  1  being  sunny.  We  rewrite  (9.36)  in  a  more 
convenient  matrix  form: 

u(fc+i  )=Tu(k\  where  T=('7  g)>  (9-37) 

In  a  Markov  process,  the  vector  of  probabilities  u ^  is  known  as  the  /cth  state  vector  and  the 
matrix  T  is  known  as  the  transition  matrix ,  whose  entries  ffx  the  transition  probabilities 
between  the  various  states. 
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Figure  9.4.  The  Set  of  Probability  Vectors  in  M3. 


By  assumption,  the  initial  state  vector  is  =  ( 1,  0  )T,  since  we  know  for  certain  that 
today  is  sunny.  Rounded  off  to  three  decimal  places,  the  subsequent  state  vectors  are 


u 


u 


(5) 


f.  438 
y  .563 


5 


The  iterates  converge  fairly  rapidly  to  (.4,  .6)T,  which  is,  in  fact,  a  fixed  point  for  the 
iterative  system  (9.37).  Thus,  in  the  long  run,  40%  of  the  days  will  be  sunny  and  60%  will 
be  cloudy.  Let  us  explain  why  this  happens. 


T 

Definition  9.28.  A  vector  u  =  ( rq,  u2-> . . . ,  un  )  E  Mn  is  called  a  probability  vector  if  all 
its  entries  he  between  0  and  1,  so  0  <  ui  <  1  for  i  =  1, . . . ,  n,  and,  moreover,  their  sum  is 
tq  T  *  •  •  T  un  =  1. 


We  interpret  the  entry  ui  of  a  probability  vector  as  the  probability  that  the  system 
is  in  state  number  i.  The  fact  that  the  entries  add  up  to  1  means  that  they  represent  a 
complete  list  of  probabilities  for  the  possible  states  of  the  system.  The  set  of  probability 
vectors  defines  an  (n—  l)-dimensional  simplex  in  For  example,  the  possible  probability 
vectors  uGl3  fill  the  equilateral  triangle  plotted  in  Figure  9.4. 

Remark.  Every  nonzero  vector  0  %  v  =  ( ry,  v2, . . . ,  vn  )  with  all  non- negative  entries, 
vi  >  0  for  i  —  1, . . . ,  n,  can  be  converted  into  a  parallel  probability  vector  by  dividing  by 
the  sum  of  its  entries: 

v 

u  =  - . 

vl~\  \-Vn 

For  example,  if  v  =  ( 3,  2,  0, 1  )T,  then  u  =  (  0,  \  )  is  the  corresponding  probability 

vector. 


(9.38) 


In  general,  a  Markov  chain  is  represented  by  a  first  order  linear  iterative  system 

u(fc+i  )  =  Tu{k\  (9.39) 

whose  initial  state  u,l,)  is  a  probability  vector.  The  entries  of  the  transition  matrix  T  must 
satisfy 

0  <  t{j  <  1, 


T  T  i'nj 


(9.40) 
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The  entry  t7J  represents  the  transitional  probability  that  the  system  will  switch  from  state 
j  to  state  i.  (Note  the  reversal  of  indices.)  Since  this  covers  all  possible  transitions,  the 
column  sums  of  the  transition  matrix  are  all  equal  to  1,  and  hence  each  column  of  T 
is  a  probability  vector,  which  is  equivalent  to  condition  (9.40).  In  Exercise  9.3.24  you 
are  asked  to  show  that,  under  these  assumptions,  if  u ^  is  a  probability  vector,  then  so 
is  u(fc+1)  =  Tu^,  and  hence,  given  our  assumption  on  the  initial  state,  the  solution 
u(fc)  _  rpk  u(o)  j-Yie  Markov  process  defines  a  sequence,  or  “chain”,  of  probability  vectors. 

Let  us  now  investigate  the  convergence  of  the  Markov  chain.  Not  all  Markov  chains 
converge  —  see  Exercise  9.3.9  for  an  example  —  and  so  we  impose  some  additional  mild 
restrictions  on  the  transition  matrix. 

Definition  9.29.  A  transition  matrix  (9.40)  is  regular  if  some  power  Tk  contains  no  zero 
entries.  In  particular,  if  T  itself  has  no  zero  entries,  then  it  is  regular. 

Warning.  The  term  “regular  transition  matrix”  has  nothing  to  do  with  our  earlier  term 
“regular  matrix”,  which  was  used  to  describe  matrices  with  an  LU  factorization. 

The  entries  of  Tk  describe  the  transition  probabilities  of  getting  from  one  state  to 
another  in  k  steps.  Thus,  regularity  of  the  transition  matrix  means  that  there  is  a  nonzero 
probability  of  getting  from  any  state  to  any  other  state  in  exactly  k  steps  for  some  k  >  1. 

The  asymptotic  behavior  of  a  regular  Markov  chain  is  governed  by  the  following  basic 
result,  originally  due  to  the  German  mathematicians  Oskar  Perron  and  Georg  Frobenius 
in  the  early  part  of  the  twentieth  century.  A  proof  can  be  found  at  the  end  of  this  section. 

Theorem  9.30.  If  T  is  a  regular  transition  matrix,  then  it  admits  a  unique  probability 
eigenvector  u*  with  eigenvalue  X1  =  1.  Moreover,  a  Markov  chain  with  coefficient  matrix 
T  will  converge  to  the  probability  eigenvector:  u ^  u*  as  k  oo. 

Example  9.31.  The  eigenvalues  and  eigenvectors  of  the  weather  transition  matrix  (9.37) 


The  first  eigenvector  is  then  converted  into  a  probability  vector  via  formula  (9.38): 


This  distinguished  probability  eigenvector  represents  the  final  asymptotic  state  of  the  sys¬ 
tem  after  many  iterations,  no  matter  what  the  initial  state  is.  Thus,  our  earlier  observation 
that  about  40%  of  the  days  will  be  sunny  and  60%  will  be  cloudy  does  not  depend  upon 
today’s  weather. 

Example  9.32.  A  taxi  company  in  Minnesota  serves  the  cities  of  Minneapolis  and 

St.  Paul,  as  well  as  the  nearby  suburbs.  Records  indicate  that,  on  average,  10%  of  the 
customers  taking  a  taxi  in  Minneapolis  go  to  St.  Paul  and  30%  go  to  the  suburbs.  Cus¬ 
tomers  boarding  in  St.  Paul  have  a  30%  chance  of  going  to  Minneapolis  and  a  30%  chance 
of  going  to  the  suburbs,  while  suburban  customers  choose  Minneapolis  40%  of  the  time  and 
St.  Paul  30%  of  the  time.  The  owner  of  the  taxi  company  is  interested  in  knowing  where 
the  taxis  will  end  up,  on  average.  Let  us  write  this  as  a  Markov  process.  The  entries  of 

the  state  vector  u ^  =  (u[k\  u^\  i4^)T  tell  what  proportion  of  the  taxi  fleet  is,  respec- 
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tively,  in  Minneapolis,  St.  Paul,  and  the  suburbs,  or,  equivalently,  the  probability  that  an 
individual  taxi  will  be  in  one  of  the  three  locations.  Using  the  given  data,  we  construct 
the  relevant  transition  matrix 


.3 

.4 

.3 


Note  that  T  is  regular  since  it  has  no  zero  entries.  The  probability  eigenvector 


u*  ~  ( .4714,  .2286,  .3  )T 


corresponding  to  the  unit  eigenvalue  =  1  is  found  by  first  solving  the  linear  system 
(T  —  I)v*  =  0  and  then  converting  the  solution*  v*  into  a  valid  probability  vector  u*  by 
use  of  formula  (9.38).  According  to  Theorem  9.30,  no  matter  how  the  taxis  are  initially 
distributed,  eventually  about  47%  of  the  taxis  will  be  in  Minneapolis,  23%  in  St.  Paul,  and 
30%  in  the  suburbs.  This  can  be  confirmed  by  running  numerical  experiments.  Moreover, 
if  the  owner  places  this  fraction  of  the  taxis  in  the  three  locations,  then  they  will  more  or 
less  remain  in  such  proportions  forever. 


Remark.  As  noted  earlier  —  see  Proposition  9.17  —  the  convergence  rate  of  the  Markov 
chain  to  its  steady  state  is  governed  by  the  size  of  the  subdominant  eigenvalue  A2.  The 
smaller  |  A2  |  is,  the  faster  the  process  converges.  In  the  taxi  example,  A2  =  .3  (and  A3  =  0), 
and  so  the  convergence  to  steady  state  is  fairly  rapid. 


A  Markov  process  can  also  be  viewed  as  a  weighted  digraph.  Each  state  corresponds 
to  a  vertex.  A  nonzero  transition  probability  from  one  state  to  another  corresponds  to  a 
weighted  directed  edge  between  the  two  vertices.  Note  that  the  digraph  is  typically  not 
simple,  since  vertices  can  have  two  edges  connecting  them,  one  representing  the  transition 
probability  of  getting  from  the  first  to  the  second,  and  the  second  edge  representing  the 
transition  probability  of  going  in  the  other  direction.  The  original  PageRank  algorithm 
that  underlies  Google’s  search  engine,  [64,52],  starts  with  the  internet  digraph,  whose 
vertices  are  web  pages  and  whose  directed  edges  represent  links  from  one  web  page  to 
another,  which  are  weighted  according  to  the  number  of  such  links.  To  be  effective,  the 
resulting  weighted  internet  digraph  is  supplemented  by  adding  in  a  number  of  random  low 
weight  edges.  One  then  computes  the  probability  eigenvector  associated  with  the  resulting 
digraph-based  Markov  process,  the  magnitudes  of  whose  entries,  indexed  by  the  nodes, 
effectively  rank  the  corresponding  web  pages. 

Proof  of  Theorem  9.30 :  We  begin  the  proof  by  replacing  T  by  its  transpose*  M  =  Tt, 
keeping  in  mind  that  every  eigenvalue  of  T  is  also  an  eigenvalue  of  M  albeit  with  different 
eigenvectors,  cf.  Proposition  8.12.  The  conditions  (9.40)  tell  us  that  the  matrix  M  has 
entries  0  <  mi-  —  tJ7  <  1,  and,  moreover,  the  row  sums  si  —  mtJ  =  1  of  M,  being 

the  same  as  the  corresponding  column  sums  of  T,  are  all  equal  to  1.  Since  Mk  —  (Tfc)T, 
regularity  of  T  implies  that  some  power  Mk  has  all  positive  entries. 

According  to  Exercise  1.2.29,  if  z  =  ( 1, . . . ,  1 )  is  the  column  vector  all  of  whose  entries 
are  equal  to  1,  then  the  entries  of  Mz  are  the  row  sums  of  M.  Therefore,  M z  =  z,  which 
implies  that  z  is  an  eigenvector  of  M  with  eigenvalue  X1  =  1.  As  a  consequence,  T  also  has 


*  Theorem  9.30  guarantees  that  there  is  an  eigenvector  v  with  all  non-negative  entries. 

We  apologize  for  the  unfortunate  clash  of  notation  when  writing  the  transpose  of  the  matrix  T. 
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Figure  9.5.  Gershgorin  Disks  for  a  Regular  Transition  Matrix. 


1  as  an  eigenvalue.  Observe  that  z  is  not  in  general  an  eigenvector  of  T;  indeed,  it  satisfies 
the  co -eigenvector  equation  Mz  =  Ttz  =  z. 

We  claim  that  A:  =  1  is  a  simple  eigenvalue.  To  this  end,  we  prove  that  z  spans  the 
one-dimensional  eigenspace  V1.  In  other  words,  we  need  to  show  that  if  Mv  =  v,  then  its 
entries  v1  —  •  •  •  =  vn  —  a  are  all  equal,  and  so  v  =  az  is  a  scalar  multiple  of  the  known 
eigenvector  z.  Let  us  first  prove  this  assuming  that  all  of  the  entries  of  M  are  strictly 
positive,  and  so  0  <  mi-  —  tJ7  <  1  for  all  i,  j.  Suppose  v  is  an  eigenvector  with  not  all 
equal  entries.  Let  vk  be  the  minimal  entry  of  v,  so  vk  <  vi  for  all  i  7^  /c,  and  at  least  one 
inequality  is  strict,  say  vk  <  v-.  Then  the  kth  entry  of  the  eigenvector  equation  v  =  Mv  is 


n 


V 


k 


=  E 

3  =  1 


rn 


kj  uj 


V-  > 


rn 


kj 


Vk  ~  Vk-> 


where  the  strict  inequality  follows  from  the  assumed  positivity  of  the  entries  of  M,  and 
the  final  equality  follows  from  the  fact  that  M  has  unit  row  sums.  Thus,  we  are  led  to 
a  contradiction,  and  the  claim  follows.  If  M  has  one  or  more  0  entries,  but  Mk  has  all 
positive  entries,  then  we  apply  the  previous  argument  to  the  equation  Mkv  =  v  which 
follows  from  Mv  =  v.  If  A:  =  1  is  a  complete  eigenvalue,  then  we  are  finished.  The  proof 
that  this  is  indeed  the  case  is  a  bit  technical,  and  we  refer  the  reader  to  [4]  for  the  details. 

Finally,  let  us  prove  that  all  the  other  eigenvalues  of  M  are  less  than  1  in  modulus. 
For  this  we  appeal  to  the  Gershgorin  Circle  Theorem  8.16.  Suppose  Mk  has  all  positive 
entries,  denoted  by  m\-  >  0.  Its  Gershgorin  disk  Di  is  centered  at  mb  >  0  and  has  radius 


ri  —  1  —  <  1  since  the  ith  row  sum  of  Mk  equals  1.  Thus  the  disk  lies  strictly  inside 

the  open  unit  disk  |  z  \  <  1  except  for  a  single  boundary  point  at  z  =  1;  see  Figure  9.5.  The 
Circle  Theorem  8.16  implies  that  all  eigenvalues  of  Mk  except  the  unit  eigenvalue  Ax  =  1 
must  he  strictly  inside  the  unit  disk.  Since  these  are  just  the  kth  powers  of  the  eigenvalues 
of  M,  the  same  holds  for  the  eigenvalues  themselves,  so  |  A  -  |  <  1  for  j  >  2. 

Therefore,  the  matrix  M,  and,  hence,  also  T,  satisfies  the  hypotheses  of  Proposition  9.17. 
We  conclude  that  the  iterates  =  Tfcu(°)  — u*  converge  to  a  multiple  of  the  probability 
eigenvector  of  T.  If  the  initial  condition  is  a  probability  vector,  then  so  is  every 
subsequent  state  vector  u^k\  and  so  their  limit  u*  must  also  be  a  probability  vector.  This 
completes  the  proof  of  the  theorem.  Q.E.D. 
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Exercises 


9.3.1.  Determine  if  the  following  matrices  are  regular  transition  matrices.  If  so,  find  the 


associated  probability  eigenvector,  (a) 
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9.3.2.  A  business  executive  is  managing  three  branches,  labeled  A,  B,  and  (7,  of  a  corporation. 
She  never  visits  the  same  branch  on  consecutive  days.  If  she  visits  branch  A  one  day,  she 
visits  branch  B  the  next  day.  If  she  visits  either  branch  B  or  C  that  day,  then  the  next  day 
she  is  twice  as  likely  to  visit  branch  A  as  to  visit  branch  B  or  C .  Explain  why  the  resulting 
transition  matrix  is  regular.  Which  branch  does  she  visit  the  most  often  in  the  long  run? 


9.3.3.  A  study  has  determined  that,  on  average,  a  man’s  occupation  depends  on  that  of  his 
father.  If  the  father  is  a  farmer,  there  is  a  30%  chance  that  the  son  will  be  a  blue  collar 
laborer,  a  30%  chance  he  will  be  a  white  collar  professional,  and  a  40%  chance  he  will  also 
be  a  farmer.  If  the  father  is  a  laborer,  there  is  a  30%  chance  that  the  son  will  also  be  one, 
a  60%  chance  he  will  be  a  professional,  and  a  10%  chance  he  will  be  a  farmer.  If  the  father 
is  a  professional,  there  is  a  70%  chance  that  the  son  will  also  be  one,  a  25%  chance  he  will 
be  a  laborer,  and  a  5%  chance  he  will  be  a  farmer,  (a)  What  is  the  probability  that  the 
grandson  of  a  farmer  will  also  be  a  farmer?  (b)  In  the  long  run,  what  proportion  of  the 
male  population  will  be  farmers? 

9.3.4.  The  population  of  an  island  is  divided  into  city  and  country  residents.  Each  year,  5%  of 
the  residents  of  the  city  move  to  the  country  and  15%  of  the  residents  of  the  country  move 
to  the  city.  In  2003,  35,000  people  live  in  the  city  and  25,000  in  the  country.  Assuming  no 
growth  in  the  population,  how  many  people  will  live  in  the  city  and  how  many  will  live  in 
the  country  between  the  years  2004  and  2008?  What  is  the  eventual  population  distribution 
of  the  island? 


9.3.5.  A  certain  plant  species  has  either  red,  pink,  or  white  flowers,  depending  on  its  genotype. 
If  you  cross  a  pink  plant  with  any  other  plant,  the  probability  distribution  of  the  offspring 

/.5  .25  0\ 

is  prescribed  by  the  transition  matrix  T  =  .5  .5  .5  .  On  average,  if  you  continue 

V  0  .25  .5 ) 

crossing  with  only  pink  plants,  what  percentage  of  the  three  types  of  flowers  would  you 
expect  to  see  in  your  garden? 


9.3.6.  A  genetic  model  describing  inbreeding,  in  which  mating  takes  place  only  between 
individuals  of  the  same  genotype,  is  given  by  the  Markov  process  u^n+1^  =  Tu^, 


fi 

1 

4 

0^ 

is  the  transition  matrix  and  = 

(Pn\ 

where  T  = 

0 

1 

2 

0 

9n 

,  whose  entries  are 

u 

1 

4 

\rn) 

respectively,  the  proportion  of  populations  of  genotype  AA,  Aa,  aa  in  the  nth  generation. 
Find  the  solution  to  this  Markov  process  and  analyze  your  result. 

9.3.7.  A  student  has  the  habit  that  if  she  doesn’t  study  one  night,  she  is  70%  certain  of 

studying  the  next  night.  Furthermore,  the  probability  that  she  studies  two  nights  in  a  row 
is  50%.  How  often  does  she  study  in  the  long  run? 
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9.3.8.  A  traveling  salesman  visits  the  three  cities  of  Atlanta,  Boston,  and  Chicago.  The  matrix 

describes  the  transition  probabilities  of  his  trips.  Describe  his  travels  in 


(0 

.5 

.5  \ 

1 

0 

.5 

\o 

.5 

(V 

words,  and  calculate  how  often  he  visits  each  city  on  average. 

9.3.9.  Explain  why  the  irregular  Markov  process  with  transition  matrix  T 


0  1 
1  0 


does 


not  reach  a  steady  state.  Use  a  population  model,  as  in  Exercise  9.3.4,  to  interpret  what  is 
going  on. 

9.3.10.  A  bug  crawls  along  the  edges  of  the  pictured  triangular  lattice  with  six 
vertices.  Upon  arriving  at  a  vertex,  there  is  an  equal  probability  of  its  choosing 
any  edge  to  leave  the  vertex.  Set  up  the  Markov  chain  described  by  the 
bug’s  motion,  and  determine  how  often,  on  average,  it  visits  each  vertex. 


9.3.11.  Answer  Exercise  9.3.10  for  the  larger  triangular  lattice. 


9.3.12.  Suppose  the  bug  of  Exercise  9.3.10  crawls  along  the  edges  of  the 
pictured  square  lattice.  What  can  you  say  about  its  behavior? 


<0  9.3.13.  Let  T  be  a  regular  transition  matrix  with  probability  eigenvector  v. 

(a)  Prove  that  lim  Tk  =  P=  (v  v  ...  v)  is  a  matrix  with  every  column  equal  to  v. 

k— >oo 

(b)  Explain  why  (vv  ...  v)v  =  v.  (c)  Prove  directly  that  P  is  idempotent:  P 2  =  P. 

9.3.14.  Find  lim  Tk  when  T  = 

k— >oc 

9.3.15.  Prove  that,  for  all  0  <  p,  q  <  1  with  p  +  q  >  0,  the  probability  eigenvector  of  the 

T 

transition  matrix  T=|  ^  i  ^  )  is  v  =  (  — - —  ,  — — — 

V  v  i  -q)  \p  +  q  p  +  q 

9.3.16.  Describe  the  final  state  of  a  Markov  chain  with  symmetric  transition  matrix  T  =  Tt . 

9.3.17.  True  or  false:  If  T  and  Tt  are  both  transition  matrices,  then  T  =  Tt . 

9.3.18.  True  or  false:  If  T  is  a  transition  matrix,  so  is  T-1. 

9.3.19.  A  transition  matrix  is  called  doubly  stochastic  if  both  its  row  and  column  sums  are 
equal  to  1.  What  is  the  limiting  probability  state  of  a  Markov  chain  with  doubly  stochastic 
transition  matrix? 


9.3.20.  True  or  false:  The  set  of  all  probability  vectors  forms  a  subspace  of  IRn. 

9.3.21.  Multiple  choice:  Every  probability  vector  in  IRn  lies  on  the  unit  sphere  for  the 

(a)  1  norm,  (b)  2  norm,  (c)  oo  norm,  (d)  all  of  the  above,  (e)  none  of  the  above. 

9.3.22.  True  or  false:  Every  probability  eigenvector  of  a  regular  transition  matrix  has 
eigenvalue  equal  to  1. 

9.3.23.  Write  down  an  example  of  (a)  an  irregular  transition  matrix;  (b)  a  regular  transition 
matrix  that  has  one  or  more  zero  entries. 


<0  9.3.24.  Let  T  be  a  transition  matrix.  Prove  that  if  u  is  a  probability  vector,  then  so  is  v  =  Tu. 

0  9.3.25.  (a)  Prove  that  if  T  and  S  are  transition  matrices,  then  so  is  their  product  T S. 

(b)  Prove  that  if  T  is  a  transition  matrix,  then  so  is  Tk  for  all  k  >  0. 
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9.4  Iterative  Solution  of  Linear  Algebraic  Systems 

In  this  section,  we  return  to  the  most  basic  problem  in  linear  algebra:  solving  the  linear 
algebraic  system 

Au  =  b,  (9.41) 

consisting  of  n  equations  in  n  unknowns.  We  assume  that  the  nx  n  coefficient  matrix  A  is 
nonsingular,  and  so  the  solution  u  =  A~1b  is  unique.  For  simplicity,  we  shall  only  consider 
real  systems  here. 

We  will  introduce  several  popular  iterative  methods  that  can  be  used  to  approximate 
the  solution  for  certain  classes  of  coefficient  matrices.  The  resulting  algorithms  will  pro¬ 
vide  an  attractive  alternative  to  Gaussian  Elimination,  particularly  when  one  is  dealing 
with  the  large,  sparse  systems  that  arise  in  the  numerical  solution  to  differential  equations. 
One  major  advantage  of  an  iterative  technique  is  that,  in  favorable  situations,  it  produces 
progressively  more  and  more  accurate  approximations  to  the  solution,  and  hence,  by  pro¬ 
longing  the  iterations,  can,  at  least  in  principle,  compute  the  solution  to  any  desired  order 
of  accuracy.  Moreover,  even  performing  just  a  few  iterations  may  produce  a  reasonable 
approximation  to  the  true  solution  —  in  stark  contrast  to  Gaussian  Elimination,  where 
one  must  continue  the  process  through  to  the  bitter  end  before  any  useful  information  can 
be  extracted.  A  partially  completed  Gaussian  Elimination  is  of  scant  use!  A  significant 
weakness  is  that  iterative  methods  are  not  universally  applicable,  and  their  design  relies 
upon  the  detailed  structure  of  the  coefficient  matrix. 

We  shall  be  attempting  to  solve  the  linear  system  (9.41)  by  replacing  it  with  an  iterative 
system  of  the  form 

u(fe+1)  =Tu(fe)  +c,  u(0)=u0,  (9.42) 

in  which  T  is  an  n  x  n  matrix  and  c  E  Mn.  This  represents  a  slight  generalization  of  our 
earlier  iterative  system  (9.1),  in  that  the  right-hand  side  is  now  an  affine  function  of  u^k\ 
Suppose  that  the  solutions  to  the  affine  iterative  system  converge:  u*  as  k  oo. 

Then,  by  taking  the  limit  of  both  sides  of  (9.42),  we  discover  that  the  limit  point  u*  solves 
the  fixed-point  equation 

u*=Tu*  +  c.  (9.43) 

Thus,  we  need  to  design  our  iterative  system  so  that 

(a)  the  solution  to  the  fixed-point  system  u  =  Tu  +  c  coincides  with  the  solution  to  the 

original  system  Au  =  b,  and 

(b)  the  iterates  defined  by  (9.42)  are  known  to  converge  to  the  fixed  point.  The  more 

rapid  the  convergence,  the  better. 

Before  exploring  these  issues  in  depth,  let  us  look  at  a  simple  example. 

Example  9.33.  Consider  the  linear  system 

3x-\-y  —  z  =  3,  x  —  Ay  +  2z  =  —  1,  —  2x  —  y-\-5z  =  2,  (9.44) 

which  has  the  vectorial  form  Au  =  b,  with 


One  easy  way  to  convert  a  linear  system  into  a  fixed-point  form  is  to  rewrite  it  as 
u  =  lu  ~  Au  +  A u  =  (I  —  A)u  +  b  =  Tu  +  c,  where  T  =  I  —  A,  c  =  b. 
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k 

u 

O+i)  _  j>u(fc) 

+  b 

u(fc+i) 

=  f  i#) 

+  c 

0 

0 

0 

0 

0 

0 

0 

1 

3 

-1 

2 

1 

.25 

.4 

2 

0 

-13 

-1 

1.05 

.7 

.85 

3 

15 

-64 

-7 

1.05 

.9375 

.96 

4 

30 

-322 

-4 

1.0075 

.9925 

1.0075 

5 

261 

-1633 

-244 

1.005 

1.00562 

1.0015 

6 

870 

-7939 

-133 

.9986 

1.002 

1.0031 

7 

6069 

-  40300 

-5665 

1.0004 

1.0012 

.9999 

8 

22500 

- 196240 

-5500 

.9995 

1.0000 

1.0004 

9 

145743 

-992701 

-129238 

1.0001 

1.0001 

.9998 

10 

571980 

-4850773 

-184261 

.9999 

.9999 

1.0001 

11 

3522555 

-24457324 

-2969767 

1.0000 

1.0000 

1.0000 

In  the  present  case, 


T  =  I  —  A  =  - 


2 

1 

2 


c  =  b  = 


The  resulting  iterative  system  =  T  u ^  +  c  has  the  explicit  form 

,(k)  z(k) 

y(fe+!)  =  -x(fe)  +  5y(k)  -2z{k)  -  1, 


x(k+i)  =  _2x(fe)  -  y{K>  +  z^>  +  3, 


(9.45) 


z{k+l)  =  2x(k)  +  y(k)  -Az{k)  +  2 


Another  possibility  is  to  solve  the  first  equation  in  (9.44)  for  x,  the  second  for  y ,  and 
the  third  for  z,  so  that 


x  —  —^y+^z  +  1,  y  —  \  x  +  \  z  +  \  , 


z  =  %x  +  *y  +  b 


The  resulting  equations  have  the  form  of  a  fixed-point  system 


/  0 


1 

3  3 


1 

4 


u  =  Tu  +  c,  in  which  T  = 

M  i  o/ 

The  corresponding  iterative  system  u(fc+1)  =  T  u ^  +c  is 


0  \ 
l 
5 


/1\ 


C  = 


1 

4 


\v 


X(fc+1)  —  _  1  y(k) 


+  §2' 


l  Ak)  q. 

y(k  + 1)  —  1  X(fc)  _|_  1  _j_  1  ? 


(9.46) 


4  1  2  ~  1  4 

~(fc+l)  —  2  Ak)  _i_  ]_  n,(k)  I  2 

^  —  5^  '  5  y  '  5  ’ 

Do  the  resulting  iterative  methods  converge  to  the  solution  x  —  y  —  z  —  1,  i.e.,  to 

u*  =  ( 1, 1, 1 )  ?  The  results,  starting  with  initial  guess  ir0'  =  ( 0,  0,  0  )  ,  are  tabulated  in 
the  accompanying  table. 
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For  the  first  method,  the  answer  is  clearly  no  —  the  iterates  become  wilder  and  wilder. 
Indeed,  this  occurs  no  matter  how  close  the  initial  guess  is  to  the  actual  solution 
unless  u(°)  happens  to  be  exactly  equal  to  u*.  In  the  second  case,  the  iterates  do  converge 
to  the  solution,  and  it  does  not  take  too  long,  even  starting  from  a  poor  initial  guess,  to 
obtain  a  reasonably  accurate  approximation.  Of  course,  in  such  a  simple  example,  it  would 
be  silly  to  use  iteration,  when  Gaussian  Elimination  can  be  done  by  hand  and  produces  the 
solution  almost  immediately.  However,  we  use  the  small  examples  for  illustrative  purposes, 
in  order  to  prepare  us  to  bring  the  full  power  of  iterative  algorithms  to  bear  on  the  large 
linear  systems  arising  in  applications. 


The  convergence  of  solutions  to  (9.42)  to  the  fixed  point  u*  is  based  on  the  behavior  of 
the  error  vectors 

e(fc)  =  u(*0  _  u*;  (9.47) 

which  measure  how  close  the  iterates  are  to  the  true  solution.  Let  us  find  out  how  the 
successive  error  vectors  are  related.  We  compute 

e(fc+1)  =  u(fe+1)  -  u*  =  (Tu(fe)  +  a)  -  (Tu*  +  a)  =  T( u(fc)  -  u*)  =  Te(fc), 


showing  that  the  error  vectors  satisfy  a  linear  iterative  system 


e(^+i) 


•> 


(9.48) 


with  the  same  coefficient  matrix  T.  Therefore,  they  are  given  by  the  explicit  formula 

e(G  =  Tfce(0). 


Now,  the  solutions  to  (9.42)  converge  to  the  fixed  point,  u ^  u*,  if  and  only  if  the  error 

vectors  converge  to  zero:  e ^  0  as  k  oo.  Our  analysis  of  linear  iterative  systems,  as 

summarized  in  Theorem  9.11,  establishes  the  following  basic  convergence  result. 


Proposition  9.34.  The  solutions  to  the  affine  iterative  system  (9.42)  will  all  converge  to 
the  solution  to  the  fixed  point  equation  (9.43)  if  and  only  if  T  is  a  convergent  matrix,  or, 
equivalently,  its  spectral  radius  satisfies  p{T)  <  1. 

The  spectral  radius  p{T)  of  the  coefficient  matrix  will  govern  the  speed  of  convergence. 
Therefore,  our  goal  is  to  construct  an  iterative  system  whose  coefficient  matrix  has  as  small 
a  spectral  radius  as  possible.  At  the  very  least,  the  spectral  radius  must  be  less  than  1.  For 
the  two  iterative  systems  presented  in  Example  9.33,  the  spectral  radii  of  the  coefficient 
matrices  are  found  to  be 


p(T)  ~  4.9675,  p(T)  =  .5. 

Therefore,  T  is  not  a  convergent  matrix,  which  explains  the  wild  behavior  of  its  iterates, 

whereas  T  is  convergent,  and  one  expects  the  error  to  decrease  by  a  factor  of  roughly  |  at 
each  step,  which  is  what  is  observed  in  practice. 

The  Jacobi  Method 

The  first  general  iterative  method  for  solving  linear  systems  is  based  on  the  same  simple 
idea  used  in  our  illustrative  Example  9.33.  Namely,  we  solve  the  ith  equation  in  the  system 
A u  =  b,  which  is 

n 

E  aij  Ui  =  bi  ’ 

3  =  1 
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for  the  ith  variable  ui.  To  do  this,  we  need  to  assume  that  all  the  diagonal  entries  of  A  are 
nonzero:  a  •  •  ^  0.  The  result  is 


u 


where 


#  1/  ^  /  L/ 

i  ~  ~  AT  U3  7T~  ~  ^ij  uj  +  ci  > 

j  =  l 


®  a  j—i 


(9.49) 


% 


*  7^  j, 


and 


6, 


c-  = 


(9.50) 

n  •  • 

o,  *  =  J, 

The  result  has  the  form  of  a  fixed-point  system  u  =  Tu  +  c,  and  forms  the  basis  of  the 
Jacobi  Method 

u(fc+i)  =  Tu(k)  +  c  u(0)  =  Uq;  (9.51) 

named  after  the  influential  nineteenth-century  German  analyst  Carl  Jacobi.  The  explicit 
form  of  the  Jacobi  iterative  algorithm  is 


n 


U- 


(k+1)  =  -w  E  %uf  +  bi 


au  j=1 


aii 


(9.52) 


It  is  instructive  to  rederive  the  Jacobi  Method  in  matrix  form.  We  begin  by  decomposing 
the  coefficient  matrix 

A  =  L  +  D  +  U  (9.53) 

into  the  sum  of  a  strictly  lower  triangular  matrix  L,  meaning  all  its  diagonal  entries  are  0, 
a  diagonal  matrix  D,  and  a  strictly  upper  triangular  matrix  U:  each  of  which  is  uniquely 
specified;  see  Exercise  1.3.11.  For  example,  when 


A  = 


3 

1 


1 

-4 


-2  -1 


(9.54) 


the  decomposition  (9.53)  yields 


L  = 


D  = 


U  = 


Warning.  The  L,  D,  V  in  the  elementary  additive  decomposition  (9.53)  have  nothing  to 
do  with  the  L,  D,  U  appearing  in  factorizations  arising  from  Gaussian  Elimination.  The 
latter  play  no  role  in  the  iterative  solution  methods  considered  in  this  section. 

We  then  rewrite  the  system 

Au  =  (L  +  D  +  U)  u  —  b  in  the  alternative  form  Du—  —  (L  A  U)  u  +  b. 
The  Jacobi  fixed  point  system  (9.49)  amounts  to  solving  the  latter  for 


u  =  Tu  +  c,  where  T  =  —D  1(LAU), 
For  the  example  (9.54),  we  recover  the  Jacobi  iteration  matrix 


c  —  D  1b. 


(9.55) 


T  =  -D~1(L  +  U)  = 


0 

1 
4 

V  2 

\  K 


5  A 

0  I 
l  0/ 
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Deciding  in  advance  whether  the  Jacobi  Method  will  converge  is  not  easy.  However,  it 
can  be  shown  that  Jacobi  iteration  is  guaranteed  to  converge  when  the  original  coefficient 
matrix  has  large  diagonal  entries,  in  accordance  with  Definition  8.18. 

Theorem  9.35.  If  the  coefficient  matrix  A  is  strictly  diagonally  dominant,  then  the 
associated  Jacobi  iteration  converges. 


Proof :  We  shall  prove  that  ||  <  1,  and  so  Proposition  9.22  implies  that  T  is  a  conver¬ 

gent  matrix.  The  absolute  row  sums  of  the  Jacobi  matrix  T  =  —  D~1  (L  +  U)  are,  according 
to  (9.50), 

1 


n 


i  y  ]  i  v 

3  =  1 


aii 


E 

3  = 1 


a 


1 3 


<  1, 


(9.56) 


because  A  is  strictly  diagonally  dominant,  and  hence  satisfies  (8.28).  This  implies  that 


Tl^  =  maxjs-L, . . . ,  sn }  <  1,  and  the  result  follows. 


Q.E.D. 


Example  9.36.  Consider  the  linear  system 


4x  +  y  +  w  =  1, 
x-\-4:y-\-z-\-v  —  2, 
y  +  Az  +  w  =  -  1, 
x  +  z  +  4  re  +  =  2, 

y  A  w  A  Av  —  1 . 

The  Jacobi  Method  solves  the  respective  equations  for  x,y,z,w,v:  leading  to  the  iterative 
equations 

z(fe+1)  =  -\y(k)  -  \w{k)  +  i, 

?yO+i)  _  _  i  rO)  _  i  ~0)  _  i  ?;(fc)  I  i 

Jk+l)  _  _  1  (k)  _  1  (fc)  _  1 

^  4  y  4  45 

^0+1)  _  _  1  x(fc)  _  1  ^O)  _  1  v(k)  1  ^ 

^(fe+1)  =  _  ly(fc)  -  IW(fe)  +  1. 

The  coefficient  matrix  of  the  original  system, 


/4  1 

1  4 

0  1 
1  0 
\0  1 


0  1 
1  0 
4  1 
1  4 
0  1 


is  strictly  diagonally  dominant,  and  so  we  are  guaranteed  that  the  Jacobi  iterations  will 
eventually  converge  to  the  solution.  Indeed,  the  Jacobi  scheme  takes  the  iterative  form 
(9.55),  with 


0 

1 

4 

0 

1 

4 

°v 

( 

1 

0 

1 

0 

1 

1 

4 

4 

4 

2 

0 

1 

4 

0 

1 

4 

0 

,  C  = 

1 

4 

1 

0 

1 

0 

1 

1 

'  4 

4 

4 

2 

0 

1 

4 

0 

1 

4 

0/ 

-J 
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Q 

Note  that  T  ^  |  <  1,  validating  convergence.  Thus,  to  obtain,  say,  four  decimal  place 

accuracy  in  the  solution,  we  estimate  that  it  will  take  fewer  than  log(.5  x  10-4) /  log  .75  34 

iterates,  assuming  a  moderate  initial  error.  But  the  matrix  norm  always  underestimates 
the  true  rate  of  convergence,  as  prescribed  by  the  spectral  radius  p(T)  —  .6124,  which 
would  imply  about  log(.5  x  10-4)/  log  .6124  20  iterations  to  obtain  the  desired  accuracy. 

Indeed,  starting  with  the  initial  guess  x ^  —  z ^  —  v ^  —  0,  the  Jacobi 

iterates  converge  to  the  exact  solution 


x  =  —  .1, 


V  =  .7, 


x  =  —  .6, 


w  =  .7, 


v  =  -•!, 


to  within  four  decimal  places  in  exactly  20  iterations. 


Exercises 


9.4.1.  (a)  Find  the  spectral  radius  of  the  matrix  T  = 


“I  -5 


(b)  Predict  the  long  term 


behavior  of  the  iterative  system  =  T  u ^  A  b,  where  b  =  ^ 


-1 

2 


in  as  much  detail 


as  you  can. 


9.4.2.  Answer  Exercise  9.4.1  when  (a)  T  = 


1 

1 


(b)  T  = 


A  4  o^ 

0  I 
1  \) 


0 
1 


b  = 


(c)  T  = 


b=  (?  ; 

w 

.15  .15  \ 

/-L5\ 

.15  -.35 

,  b  = 

1.6 

-.2  .3  ) 

l  1-7  y 

(a) 


(b) 


9.4.3.  Which  of  the  following  systems  have  a  strictly  diagonally  dominant  coefficient  matrix? 

i  i  —  2x  y  -\~  z  = 

hx+  ly  =  1,  -5x  +  y  =  3,  .  . 

1  "  c  (c)  ,  ,9  _  (d)  -x  +  2y-z  =  -2, 

y  x-y  +  3z  =  l] 

x  —  2y  -\-  z  =  1,  —  Ax  +  2y  +  z  =  2, 

(f)  -x  +  2y  +  z  =  -1,  (g)  -x  +  3y  +  z  =  -1, 

x  +  3y  —  2z  =  3]  x  +  4y  — 6^  =  3. 


(e) 


5x  —  y  =  1, 

—  x-  +  3  y  =  — 1; 

—  x  A  ^2/  +  \  z  =  1, 
^xA27/A|^  =  -3, 


^x+\y  ~^z  =  2\ 


4*  9.4.4.  For  the  strictly  diagonally  dominant  systems  in  Exercise  9.4.3,  starting  with  the  initial 
guess  x  =  y  =  z  =  0,  compute  the  solution  to  2  decimal  places  using  the  Jacobi  Method. 
Check  your  answer  by  solving  the  system  directly  by  Gaussian  Elimination. 

4*  9.4.5.  (a)  Do  any  of  the  non-strictly  diagonally  dominant  systems  in  Exercise  9.4.3  lead  to 
convergent  Jacobi  algorithms?  Hint :  Check  the  spectral  radius  of  the  Jacobi  matrix. 

(b)  For  the  convergent  systems  in  Exercise  9.4.3,  starting  with  the  initial  guess  x  =  y  =  z  = 
0,  compute  the  solution  to  2  decimal  places  by  using  the  Jacobi  Method,  and  check  your 
answer  by  solving  the  system  directly  by  Gaussian  Elimination. 

9.4.6.  The  following  linear  systems  have  positive  definite  coefficient  matrices.  Use  the  Jacobi 
Method  starting  with  =  0  to  find  the  solution  to  4  decimal  place  accuracy. 


(a) 


3 

-1 


1 

5 


u 


2 

1 


(b) 


2 

1 


1 

1 


u 


3 

1 


(c) 


/ 


V 


6 

-1 

-3 


1 

7 

4 


4 


\ 

u  = 

-2 

) 

^  7) 

(d) 


( 


V 


3 

1 

0 


1 

2 

1 


u 


/ 


V 


(e) 


(  5 

1 

1 

n 

(4\ 

(  3 

1 

0 

-n 

/  1\ 

1 

5 

1 

i 

0 

,  (f) 

1 

3 

1 

0 

2 

1 

1 

5 

i 

u  = 

0 

0 

1 

3 

i 

u  = 

0 

u 

1 

1 

5  / 

\o) 

w 

0 

1 

3  / 
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4»  9.4.7.  Let  A  be  the  n  x  n  tridiagonal  matrix  with  all  its  diagonal  entries  equal  to  c  and  all 
l’s  on  the  sub-  and  super-diagonals,  (a)  For  which  values  of  c  is  A  strictly  diagonally 
dominant?  (b)  For  which  values  of  c  does  the  Jacobi  iteration  for  Aw  =  b  converge  to  the 
solution?  What  is  the  rate  of  convergence?  Hint:  Use  Exercise  8.2.48.  (c)  Set  c  =  2  and 
use  the  Jacobi  Method  to  solve  the  linear  systems  Ku  =  el5  for  n  =  5, 10,  and  20.  Starting 
with  an  initial  guess  of  0,  how  many  Jacobi  iterations  does  it  take  to  obtain  3  decimal  place 
accuracy?  Does  the  convergence  rate  agree  with  what  you  computed  in  part  (c)? 

9.4.8.  Prove  that  0/uG  ker  A  if  and  only  if  u  is  an  eigenvector  of  the  Jacobi  iteration  matrix 
with  eigenvalue  1.  What  does  this  imply  about  convergence? 

0  9.4.9.  Prove  that  if  A  is  a  nonsingular  coefficient  matrix,  then  one  can  always  arrange  that  all 
its  diagonal  entries  are  nonzero  by  suitably  permuting  its  rows. 

9.4.10.  Consider  the  iterative  system  (9.42)  with  spectral  radius  p(T)  <  1.  Explain  why  it 

takes  roughly  —  l/log10  p(T)  iterations  to  produce  one  further  decimal  digit  of  accuracy  in 
the  solution. 

9.4.11.  True  or  false:  If  a  system  Aw  =  b  has  a  strictly  diagonally  dominant  coefficient  matrix 
A,  then  the  equivalent  system  obtained  by  applying  an  elementary  row  operation  to  A  also 
has  a  strictly  diagonally  dominant  coefficient  matrix. 


The  Gauss— Seidel  Method 


The  Gauss-Seidel  Method  relies  on  a  slightly  refined  implementation  of  the  Jacobi  process. 
To  understand  how  it  works,  it  will  help  to  write  out  the  Jacobi  iteration  algorithm  (9.51) 
in  full  detail: 


u 


u 


u 


(fc+ 1)  . 

1  — 

(fc+i) 

2 

(fc+1) 


?12^2  ^  y  T 


O) 

13  u'3 


i  (k) 

*21  U1 


i  *  (k)  | 

+  *23  U3  + 


i  (k)  .  ,  (k) 

t3 1  u\  +  Go  u<  ' 


32  2 


T  t 

+  ^2  ,n 


,w 


l.n  — 1  ^Jn  —  1  "h  ?in 


(k) 


+  Ci, 


(k)  ,  , 

1 Un- 1  +  * 

,(*=) 


<?/(b  _i_ 

2n  '  c2’ 


+*3,n-lWn-l  +*3nUifc)  +C3, 


(9.57) 


u 


(fc+1)  -L 

n  1 


(k) 

n  1  u'l 


n;  '  +  Go  u 


(k)  ,  ,  (fc) 

n2  ^2  +  ^n3  ^3 


?  1 

'n,n  —  1  n  — 1 

where  we  are  explicitly  noting  the  fact  that  all  the  diagonal  entries  of  the  coefficient  matrix 
T  vanish.  Observe  that  we  are  using  the  entries  of  the  current  iterate  u ^  to  compute 

all  of  the  updated  values  of  u^fc+1\  Presumably,  if  the  iterates  u ^  are  converging  to  the 
solution  u*,  then  their  individual  entries  are  also  converging,  and  so  each  should  be 

a  better  approximation  to  u*  than  is.  Therefore,  if  we  begin  the  kth  Jacobi  iteration 

by  computing  using  the  first  equation,  then  we  are  tempted  to  use  this  new  and 

improved  value  to  replace  u[k^  in  each  of  the  subsequent  equations.  In  particular,  we 
employ  the  modified  equation 


(*0 


+  c 


n  i 


U 


(AH- 1) 


—  t21  u1 


(fc+ 1) 


+  t 


23 


(k) 

3 


'  T 


T  t 


1 1 

In  n 


(fc) 


T  Cc 


to  update  the  second  component  of  our  iterate.  This  more  accurate  value  should  then  be 
used  to  update  u'3^1\  and  so  on. 

The  upshot  of  these  considerations  is  the  Gauss-Seidel  Method 


(fc+i)  ,  (fc+ 1)  . 

ui  =  tn  u\  + 


+  ti:_lufX1) 


Ua  1  '  +  t-  ■ 


(k)  , 

i,i+iui+i  + 


tin  ^ n 


(k) 


+  ct ,  i  —  1, 


...n. 

(9.58) 
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named  after  Gauss  (as  usual!)  and  the  German  astronomer/mathematician  Philipp  von 
Seidel.  At  the  kth  stage  of  the  iteration,  we  use  (9.58)  to  compute  the  revised  entries 

u^k+1\  r4fc+1\  •  •  • ,  their  numerical  order.  Once  an  entry  has  been  updated,  the 

new  value  is  immediately  used  in  all  subsequent  computations. 

Example  9.37.  For  the  linear  system 

3x  +  y  —  z  =  3,  x  —  4y-h2z=—l,  —  2x  —  y -h  5 z  =  2, 


the  Jacobi  iteration  method  was  given  in  (9.46).  To  construct  the  corresponding  Gauss- 
Seidel  algorithm  we  use  updated  values  of  x,y,  and  z  as  they  become  available.  Explicitly, 


x(k+1)  =  ~ly(k)  +  \z{k)  +  1, 

y(k+l)  _  1  x(k+l)  _|_  1  z(k)  _|_  1  ^ 

z(k+l)  _  2  x(k+ 1)  _|_  1  y(k+ 1)  _|_  2. 


Starting  with  =  0,  the  resulting  iterates  are 


/ 1.0000  \ 

.5000  , 

\  .9000  / 


/ 1.1333  \ 
.9833  , 

\  1.0500  / 


/ 1.0222  \ 

1.0306  , 

\  1.0150  / 


u 


(5) 


.9977  \ 
.9990  , 

.9989  ) 


/ 1.0000  \ 

.9994  , 

\  .9999 ) 


/ 1.0001  \ 
1.0000  , 
\  1.0001 ) 


(9.59) 


/  .9948  \ 
1.0062  , 
\  .9992  / 

/l.0000\ 
1.0000  , 
\  1.0000  ) 


and  have  converged  to  the  solution,  to  4  decimal  place  accuracy,  after  only  8  iterations 
as  opposed  to  the  11  iterations  required  by  the  Jacobi  Method. 


Gauss-Seidel  iteration  is  particularly  suited  to  implementation  on  a  serial  computer, 

since  one  can  immediately  replace  each  component  u<k>  by  its  updated  value  v>k  1  1  \  thereby 
also  saving  on  storage  in  the  computer’s  memory.  In  contrast,  the  Jacobi  Method  requires 
us  to  retain  all  the  old  values  ul  ,,:)  until  the  new  approximation  ul/,:  l_  1 1  has  been  computed. 
Moreover,  Gauss-Seidel  typically  (although  not  always)  converges  faster  than  Jacobi,  mak¬ 
ing  it  the  iterative  algorithm  of  choice  for  serial  processors.  On  the  other  hand,  with  the 
advent  of  parallel  processing  machines,  variants  of  the  parallelizable  Jacobi  scheme  have 
been  making  a  comeback. 

What  is  Gauss-Seidel  really  up  to?  Let  us  rewrite  the  basic  iterative  equation  (9.58)  by 
multiplying  by  ari  and  moving  the  terms  involving  u(  /'  l- 1 1  to  the  left-hand  side.  In  view  of 
the  formula  (9.50)  for  the  entries  of  T,  the  resulting  equation  is 

(fc+i)  .  .  (fc+i)  .  (/c-yi)  _  (fc)  .  7 

ailU  1  +  •••  +  +  aiiUi  ~  ~  ainUn 

In  matrix  form,  taking  (9.53)  into  account,  this  reads 

(L  +  D)  u(fc+1)  =  -U  u(fc)  +  b,  (9.60) 

and  so  can  be  viewed  as  a  linear  system  of  equations  for  u(fc+1)  with  lower  triangular 
coefficient  matrix  L  +  D.  Note  that  the  fixed  point  of  (9.60),  namely  the  solution  to 

(L  +  D)  u  =  —  Uu  +  b, 

coincides  with  the  solution  to  the  original  system 

/In  =  (L-b,D-b£7)u  =  b. 
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In  other  words,  the  Gauss-Seidel  procedure  is  merely  implementing  Forward  Substitution 
to  solve  the  lower  triangular  system  (9.60)  for  the  next  iterate: 

u(fc+i)  =  _(£-)_  D^U u(fc)  +  (L  +  D)-1  b. 

The  latter  is  in  our  more  usual  iterative  form 


U(fc+i)  =  T  u(fc)  +  C,  where  T  =  —  {L  +  D)  1U ,  c  =  {L  +  D)  1  b.  (9.61) 

Consequently,  the  convergence  ofMhe  Gauss-Seidel  iterates  is  governed  by  the  spectral 
radius  of  their  coefficient  matrix  T. 

Returning  to  Example  9.37,  we  have 


L  +  D 


0  0\ 
-4  0  , 

— !  5/ 


Therefore,  the  Gauss-Seidel  matrix  is 


T  =  -(L  +  D)~1U 


0 

-.3333 

.3333  \ 

0 

-.0833 

.5833 

Vo 

-.1500 

.2500  / 

Its  eigenvalues  are  0  and  .0833  ±  .2444  i,  and  hence  its  spectral  radius  is  p(  T )  ~  .2582. 
This  is  roughly  the  square  of  the  Jacobi  spectral  radius  of  .5,  which  tells  us  that  the 
Gauss-Seidel  iterations  will  converge  about  twice  as  fast  to  the  solution.  This  can  be 
verified  by  more  extensive  computations.  Although  examples  can  be  constructed  in  which 
the  Jacobi  Method  converges  faster,  in  many  practical  situations  Gauss-Seidel  tends  to 
converge  roughly  twice  as  fast  as  Jacobi. 

Completely  general  conditions  guaranteeing  convergence  of  the  Gauss-Seidel  Method 
are  also  hard  to  establish.  But,  like  the  Jacobi  Method,  it  is  guaranteed  to  converge  when 
the  original  coefficient  matrix  is  strictly  diagonally  dominant. 

Theorem  9.38.  If  A  is  strictly  diagonally  dominant,  then  the  Gauss-Seidel  iteration  al¬ 
gorithm  for  solving  Au  —  b  converges. 


Proof :  Let  e ^  =  u ^  —  u *  denote  the  kth  Gauss-Seidel  error  vector.  As  in  (9.48),  the 
error  vectors  satisfy  the  linear  iterative  system  =  Te^,  but  a  direct  estimate  of 

TUqq  is  not  so  easy.  Instead,  let  us  write  out  the  linear  iterative  system  in  components: 


0+1)  _  f  p(fc+1)  I 


_ L  i  e^+1^ 

'  liA- 1  G-l 


O'-  i  '  +  t-  ■ 


(k)  i 

i,i+ 1  ei+ 1  + 


-1 -  f  . 

1  in  n 


(k) 


(9.62) 


Let 


m ^  = 


,(fe) 


OO 


=  max!  |  e 


(fe) 

l 


,(fc) 


n 


} 


(9.63) 


denote  the  oo  norm  of  the  kth  error  vector.  To  prove  convergence,  e ^  -A  0,  it  suffices  to 
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show  that  m ^  0  as  k  oo.  We  claim  that  diagonal  dominance  of  A  implies  that 


rri 


(fc+i)  <- 


s  m 


(k) 


where 


s  = 


T 


oo 


<  1 


(9.64) 


denotes  the  oo  matrix  norm  of  the  Jacobi  matrix  T  —  not  the  Gauss-Seidel  matrix  T 
which,  by  (9.56),  is  less  than  1.  We  infer  that  m ^  <  sk  m ^  0  as  k  — ^  oo,  demonstrating 

the  theorem. 

To  prove  (9.64),  we  use  induction  on  z  =  1, . . . ,  n.  Our  induction  hypothesis  is 


ei 


(fc+ 1) 


<  s  m ^  <  m ^  for  j  =  1, . . . ,  i  —  1 

(When  i  =  l,  there  is  no  assumption.)  Moreover,  by  (9.63), 


^  <  m ^ 


j 


for  all 


We  use  these  two  inequalities  to  estimate 


e- 


(fc+ 1) 


j  =  1, . . . ,  n. 
from  (9.62): 
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eU i 
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( k ) 


which  completes  the  induction  step.  As  a  result,  the  maximum 


m(fc+i)  _  max|  | 


.(fc+i) 


n 


}< 


s  rn 


(k) 


+  tn 


m 


Xk) 


n 


also  satisfies  the  same  bound,  and  hence  (9.64)  follows. 


Q.E.D. 


Example  9.39.  For  the  linear  system  considered  in  Example  9.36,  the  Gauss-Seidel 
iterations  take  the  form 

yfc+l)  =  -  I  y(k)  -  1/)  +  1, 

?y(fc+i)  _  1  (k+ 1)  _  1  jk)  _  1  (k)  ,  1 

y(fc+l)  —  _i?/<A+1)  _  _  4 

^  4  y  4  ^  45 

^(fc+i) _ _  _  1  ^(fc+i)  _  i  -y(^)  _j_  i  ^ 

^(^+1) = _  1  ^(^+1)  _  iw(hi) + 1  _ 

Starting  with  =  y(°)  =  z =  0,  the  Gauss-Seidel  iterates  converge  to 
the  solution  x  =  — .  1 ,  y  —  .7,  z  —  —  .6,  w  —  . 7,  2;  =  —.1,  to  four  decimal  places  in  11 
iterations,  again  roughly  twice  as  fast  as  the  Jacobi  Method.  Indeed,  the  convergence  rate 
is  governed  by  the  corresponding  Gauss-Seidel  matrix  T,  which  is 


/4 

0 

0 
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°\ 
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/° 
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0\ 

/0 
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0  \ 

1 

4 

0 

0 

0 

0 

0 

1 

0 

1 

0 

.0625 

-.2500 
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0 

.0664 

-.0156 
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1 

0 

1 

4/ 

Vo 

0 

0 

0 

0/ 

Vo 

-.0322 

.0664 

-.0479 

.1289/ 

Its  spectral  radius  is  p(T)  =  .3936,  which  is,  as  in  the  previous  example,  approximately  the 
square  of  the  spectral  radius  of  the  Jacobi  coefficient  matrix,  which  explains  the  speedup 
in  convergence. 
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Exercises 


/ 

4 

1 

-2\ 

—2  \ 

C  9.4.12.  Consider  the  linear  system  Ax  =  b,  where  A  = 

V 

-1 

1 

4 

-1 

-i 

b  b  = 

T - 1  J>- 

_ _ ' 

(a)  First,  solve  the  equation  directly  by  Gaussian  Elimination,  (b)  Write  the  Jacobi 
iteration  in  the  form  =  Tx^  +  c.  Find  the  3x3  matrix  T  and  the  vector  c 

explicitly,  (c)  Using  the  initial  approximation  =  0,  carry  out  three  iterations  of  the 
Jacobi  algorithm  to  compute  x^\x^2^  and  x^3^.  How  close  are  you  to  the  exact  solution? 
(d)  Write  the  Gauss-Seidel  iteration  in  the  form  x^+1^  =  f  x^  +  c.  Find  the  3x3 

matrix  T  and  the  vector  c  explicitly,  (e)  Using  the  initial  approximation  x^  =  0,  carry 
out  three  iterations  of  the  Gauss-Seidel  algorithm.  Which  is  a  better  approximation  to 

the  solution  —  Jacobi  or  Gauss-Seidel?  (f)  Determine  the  spectral  radius  of  the  Jacobi 
matrix  T,  and  use  this  to  prove  that  the  Jacobi  Method  will  converge  to  the  solution  of 

ix  =  b  for  any  choice  of  the  initial  approximation  x^.  (g)  Determine  the  spectral 

radius  of  the  Gauss-Seidel  matrix  T.  Which  method  converges  faster?  (h)  For  the  faster 
method,  how  many  iterations  would  you  expect  to  need  to  obtain  5  decimal  place  accuracy? 

(i)  Test  your  prediction  by  computing  the  solution  to  the  desired  accuracy. 

4k  9.4.13.  For  the  strictly  diagonally  dominant  systems  in  Exercise  9.4.3,  starting  with  the  initial 
guess  x  =  y  =  z  =  0,  compute  the  solution  to  3  decimal  places  using  the  Gauss-Seidel 
Method.  Check  your  answer  by  solving  the  system  directly  by  Gaussian  Elimination. 

9.4.14.  Which  of  the  systems  in  Exercise  9.4.3  lead  to  convergent  Gauss-Seidel  algorithms?  In 
each  case,  which  converges  faster,  Jacobi  or  Gauss-Seidel? 

9.4.15.  (a)  Solve  the  positive  definite  linear  systems  in  Exercise  9.4.6  using  the  Gauss-Seidel 
Method  to  achieve  4  decimal  place  accuracy. 

(b)  Compare  the  convergence  rate  with  that  of  the  Jacobi  Method. 


4»  9.4.16.  Let  A 


/  C 

1 

0 

0\ 

1 

c 

1 

0 

0 

1 

c 

1 

0 

1 

c) 

(a)  For  what  values  of  c  is  A  strictly  diagonally  dominant? 


(b)  Use  a  computer  to  find  the  smallest  positive  value  of  c  >  0  for  which  Jacobi  iteration 
converges,  (c)  Find  the  smallest  positive  value  of  c  >  0  for  which  Gauss-Seidel  iteration 
converges.  Is  your  answer  the  same?  (d)  When  they  both  converge,  which  converges  faster 
—  Jacobi  or  Gauss-Seidel?  How  much  faster?  Does  your  answer  depend  upon  the  value  of 
c? 


4k  9.4.17.  Consider  the  linear  system 

2Ax  —  .8  y  +  .8z  =  1,  —  .6x  +  3 .6y  —  .6z  =  0,  15x  +  14.4  y  —  3 .6z  =  0. 

Show,  by  direct  computation,  that  Jacobi  iteration  converges  to  the  solution,  but  Gauss- 
Seidel  does  not. 


4k  9.4.18.  Discuss  convergence  of  Gauss-Seidel  iteration  for  the  system 

5x  +  7y  +  6z  +  5w  =  23,  6x  +  8y  +  lOz  +  9w  =  33, 

7x  +  10y  +  8^  +  7ie  =  32,  5x  +  7y  +  9^  +  10ie  =  31. 


(2 

4 

— 4\ 

9.4.19.  Let  A  = 

3 

3 

3 

^2 

2 

1/ 

Find  the  spectral  radius  of  the  Jacobi  and  Gauss-Seidel 
iteration  matrices,  and  discuss  their  convergence. 


4k  9.4.20.  Consider  the  linear  system  u  =  e1?  where  H 5  is  the  5x5  Hilbert  matrix.  Does  the 
Jacobi  Method  converge  to  the  solution?  If  so,  how  fast?  What  about  Gauss-Seidel? 
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0  9.4.21.  How  many  arithmetic  operations  are  needed  to  perform  k  steps  of  the  Jacobi  iteration? 
What  about  Gauss-Seidel?  Under  what  conditions  is  Jacobi  or  Gauss-Seidel  more  efficient 
than  Gaussian  Elimination? 


*  9.4.22. 


Consider  the  linear  system  Jx  =  e1  based  on  the  10  x  10  pentadiagonal  matrix 

(  z  - 110  \ 

-1  z  -1  1  0 

1-1  z  -1  1  0 

A  —  0  1-1  z  -1  1 

0  1-1  z  -1 

0  1  -1  z 


V 


/ 


(a)  For  what  values  of  z  are  the  Jacobi  and  Gauss-Seidel  Methods  guaranteed  to  converge? 

(b)  Set  z  =  4.  How  many  iterations  are  required  to  approximate  the  solution  to  3  decimal 
places?  (c)  How  small  can  |  z  |  be  before  the  methods  diverge? 


4»  9.4.23.  The  naive  iterative  method  for  solving  Aw  =  b  is  to  rewrite  it  in  fixed  point  form 

u  =  Tu  +  c,  where  T  =  I  —  A  and  c  =  b.  (a)  What  conditions  on  the  eigenvalues  of  A 
ensure  convergence  of  the  naive  method?  (b)  Use  the  Gershgorin  Theorem  8.16  to  prove 


that  the  naive  method  converges  to  the  solution  to 
(c)  Check  part  (b)  by  implementing  the  method. 


-.1 

1.5 

-.1 


Successive  Over-Relaxation 

As  we  know,  the  smaller  the  spectral  radius  (or  matrix  norm)  of  the  coefficient  matrix, 
the  faster  the  convergence  of  the  iterative  algorithm.  One  of  the  goals  of  researchers  in 
numerical  linear  algebra  is  to  design  new  methods  for  accelerating  the  convergence.  In  his 
1950  thesis,  the  American  mathematician  David  Young  discovered  a  simple  modification  of 
the  Jacobi  and  Gauss-Seidel  Methods  that  can,  in  favorable  situations,  lead  to  a  dramatic 
speedup  in  the  rate  of  convergence.  The  method,  known  as  Successive  Over- Relaxation, 
and  often  abbreviated  SOR,  has  become  the  iterative  method  of  choice  in  a  range  of  modern 
applications,  [21,  86].  In  this  subsection,  we  provide  a  brief  overview. 

In  practice,  finding  the  optimal  iterative  algorithm  to  solve  a  given  linear  system  is  as 
hard  as  solving  the  system  itself.  Therefore,  numerical  analysts  have  relied  on  a  few  tried 
and  true  techniques  for  designing  iterative  schemes  that  can  be  used  in  the  more  common 
applications.  Consider  a  linear  algebraic  system  A u  =  b.  Every  decomposition  of  the 
coefficient  matrix  into  the  difference  of  two  matrices, 

A  =  M  -  TV,  (9.65) 

leads  to  an  equivalent  system  of  the  form 

Mu  =  Nu  +  b.  (9.66) 

Provided  that  M  is  nonsingular,  we  can  rewrite  the  preceding  system  in  fixed  point  form: 

u  =  M~l N  u  +  M~lh  =  Tu  +  c,  where  T  =  M~XN,  c  =  M~1h. 

Now,  we  are  free  to  choose  any  such  M,  which  then  specifies  N  =  A— M  uniquely.  However, 
for  the  resulting  iterative  method  u(fc+1)  =  T  u ^  +c  to  be  practical  we  must  arrange  that 

(a)  T  —  M~1N  is  a  convergent  matrix,  and 

(b)  M  can  be  easily  inverted. 
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The  second  requirement  ensures  that  the  iterative  equations 

Mu(fc+1)  =  iVu(fe)  +b  (9.67) 

can  be  solved  for  u1  k  +  1 1  with  minimal  computational  effort.  Typically,  this  requires  that 
M  be  either  a  diagonal  matrix,  in  which  case  the  inversion  is  immediate,  or  lower  or  upper 
triangular,  in  which  case  one  employs  Forward  or  Back  Substitution  to  solve  for  u^fc+1\ 
With  this  in  mind,  we  now  introduce  the  SOR  Method.  It  relies  on  a  slight  generalization 
of  the  Gauss-Seidel  decomposition  (9.60)  of  the  matrix  into  lower  triangular  and  strictly 
upper  triangular  parts.  The  starting  point  is  to  write 


A = L+D+U 


L  ot  D 


[(a-l)D-U 


(9.68) 


where  0  ^  a  is  an  adjustable  scalar  parameter.  We  decompose  the  system  Au  =  b  as 

(. L  +  aD)u  =  (a  —  1)  D  —  U]  u  +  b.  (9.69) 

It  turns  out  to  be  slightly  more  convenient  to  divide  (9.69)  through  by  a  and  write  the 
resulting  iterative  system  in  the  form 


(ujL  +  D)  u(fc+1)  =  \(1-uj)D-ujU]u^  +ujb, 


(9.70) 


where  uj  =  1/a  is  called  the  relaxation  parameter.  Assuming,  as  usual,  that  all  diagonal 
entries  of  A  are  nonzero,  the  matrix  uj  L  +  D  is  an  invertible  lower  triangular  matrix,  and 
so  we  can  use  Forward  Substitution  to  solve  the  iterative  system  (9.70)  to  recover  u^fc+1\ 
The  explicit  formula  for  its  ith  entry  is 


u 


\k+1)=u>tilu<f+1)  + 


+  +  (1  -  w)  u\ 

(k) 


(k) 


,(fc) 


(9.71) 


+  "•  +u;tinu)?j 


where  ti-  and  ci  denote  the  original  Jacobi  values  (9.50).  As  in  the  Gauss-Seidel  approach, 
we  update  the  entries  u[k+1^  in  numerical  order  i  —  1, . . . ,  n.  Thus,  to  obtain  the  SOR 
scheme  (9.71),  we  merely  multiply  the  right-hand  side  of  the  Gauss-Seidel  system  (9.58) 

by  the  adjustable  relaxation  parameter  uj  and  append  the  diagonal  term  (1  —  uS)  u[k\  In 
particular,  if  we  set  w  =  1,  then  the  SOR  Method  reduces  to  the  Gauss-Seidel  Method. 
Choosing  uj  <  1  leads  to  an  under-relaxed  method,  while  uj  >  1,  known  as  over-relaxation , 
is  the  preferred  choice  in  most  practical  instances. 

To  analyze  the  SOR  algorithm  in  detail,  we  rewrite  (9.70)  in  the  fixed  point  form 

U(fc+D  =  r^u(fc)  +  Cw,  (9.72) 

where 


=  (ui L  +  D)-1  [(1  -  lo)  D  -  uiU 


cu  =  (uiL  + D)-1  uib.  (9.73) 


The  rate  of  convergence  is  governed  by  the  spectral  radius  of  the  matrix  Tu.  The  goal 

is  to  choose  the  relaxation  parameter  uj  so  as  to  make  the  spectral  radius  of  Tu  as  small 
as  possible.  As  we  will  see,  a  clever  choice  of  uj  can  result  in  a  dramatic  speedup  in  the 
convergence  rate.  Let  us  look  at  an  elementary  example. 


Example  9.40.  Consider  the  matrix  A 
A  =  L  +  D  +  [/,  where 


which  we  decompose  as 


0  0 
1  0 


2 

0 


0 

2 


0  -1 

0  0 
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Jacobi  iteration  is  based  on  the  coefficient  matrix 

t  =  -d-\l  +  U)  =  [1  *). 

Its  spectral  radius  is  p(T)  —  .5,  and  hence  the  Jacobi  Method  takes,  on  average,  roughly 
—  l/log10  .5  3.3  iterations  to  produce  each  new  decimal  place  in  the  solution. 

The  SOR  Method  (9.70)  takes  the  explicit  form 


(fc+i)  _  (  2(1  “  u) 


UJ 


0 


2(l-o;) 


u 


(fc) 


+  uj  b, 


where  Gauss-Seidel  is  the  particular  case  uj  —  1.  The  SOR  coefficient  matrix  is 


T  = 

LJ 


2  0 

uj  2 


-1 


2(l  —  o;)  uj 
0  2(l-o;) 


1  —  UJ 

huj(  1  —  uj)  j  (2  —  uj)2 


To  compute  the  eigenvalues  of  Tu,  we  form  its  characteristic  equation: 
0  =  det(Tw  '  TX  '2  n  -  1  ^  a2  1  ■■ 


A  I )  =  -  ( 2  -  2uj  +  | uj2  ) A  +  (1  -  w)2  =  (A  +  u  -  l)2  -  \\ w2.  (9.74) 


Our  goal  is  to  choose  uj  such  that 


(a)  both  eigenvalues  are  less  than  1  in  modulus,  so  |  Ax  | ,  |  A2  |  <  1 .  This  is  the  minimal 

requirement  for  convergence  of  the  method. 

(b)  the  largest  eigenvalue  (in  modulus)  is  as  small  as  possible.  This  will  give  the 

smallest  spectral  radius  for  X ^  and  hence  the  fastest  convergence  rate. 

By  (8.26),  the  product  of  the  two  eigenvalues  is  the  determinant, 


Ax  A2  =  det  Tu  =  (1  -w): 


If  uj  <  0  or  uj  >  2,  then  det  X ^  >  i,  and  hence  at  least  one  of  the  eigenvalues  would  have 
modulus  larger  than  1.  Thus,  in  order  to  ensure  convergence,  we  must  require  0  <  uj  <  2. 
For  Gauss-Seidel,  at  uj  =  1,  the  eigenvalues  are  X1  =  A2  =  0,  and  the  spectral  radius  is 
p(Tx)  =  .25.  This  is  exactly  the  square  of  the  Jacobi  spectral  radius,  and  hence  the  Gauss- 
Seidel  iterates  converge  twice  as  fast;  so  it  takes,  on  average,  only  about  —1/  log10  .25  1.66 

Gauss-Seidel  iterations  to  produce  each  new  decimal  place  of  accuracy.  It  can  be  shown 
(Exercise  9.4.32)  that  as  uj  increases  above  1,  the  two  eigenvalues  move  along  the  real  axis 
towards  each  other.  They  coincide  when 

uj  =  uj*  =  8  —  4v^3  ~  1.07,  at  which  point  X1  =  X2  =  uj*  —  1  =  .07  =  ^(T^), 


which  is  the  convergence  rate  of  the  optimal  SOR  Method.  Each  iteration  produces  slightly 
more  than  one  new  decimal  place  in  the  solution,  which  represents  a  significant  improve¬ 
ment  over  the  Gauss-Seidel  convergence  rate.  It  takes  about  twice  as  many  Gauss-Seidel 
iterations  (and  four  times  as  many  Jacobi  iterations)  to  produce  the  same  accuracy  as  this 
optimal  SOR  Method. 

Of  course,  in  such  a  simple  2x2  example,  it  is  not  so  surprising  that  we  can  construct 
the  best  value  for  the  relaxation  parameter  by  hand.  Young  was  able  to  find  the  optimal 
value  of  the  relaxation  parameter  for  a  broad  class  of  matrices  that  includes  most  of  those 
arising  in  the  finite  difference  and  finite  element  numerical  solutions  to  ordinary  and  partial 
differential  equations,  [61]-  For  the  matrices  in  Young’s  class,  the  Jacobi  eigenvalues 
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occur  in  signed  pairs.  If  d=/i  are  a  pair  of  eigenvalues  for  the  Jacobi  Method,  then  the 
corresponding  eigenvalues  of  the  SOR  iteration  matrix  satisfy  the  quadratic  equation 

(A  +  uj  —  l)2  =  Xuj2  p2 .  (9.75) 

If  uj  =  1,  so  we  have  standard  Gauss-Seidel,  then  A2  =  A^x2,  and  so  the  eigenvalues  are 
A  =  0,  A  =  fj? .  The  Gauss-Seidel  spectral  radius  is  therefore  the  square  of  the  Jacobi 
spectral  radius,  and  so  (at  least  for  matrices  in  the  Young  class)  its  iterates  converge  twice 
as  fast.  The  quadratic  equation  (9.75)  has  the  same  properties  as  in  the  2x2  version  (9.74) 

(which  corresponds  to  the  case  /x  —  ^),  and  hence  the  optimal  value  of  uj  will  be  the  one 
at  which  the  two  roots  are  equal: 


A:  —  A2  —  uj  1, 


2-2  qv  2 

which  occurs  when  uj  =  - Y; - =  - .  . 

m2  1  +  yi  —  n2 


Therefore,  if  p j  =  max  |  /x  |  denotes 
Gauss-Seidel  has  spectral  radius  pGS 
parameter 

2 


uj,  = 


1  +  yl 


P2J 


the  spectral  radius  of  the  Jacobi  Method,  then  the 
=  p2 ,  while  the  SOR  Method  with  optimal  relaxation 


has  spectral  radius  p*  =  uj*  —  1. 


(9.76) 


For  example,  if  pj  =  .99,  which  is  rather  slow  convergence  (but  common  for  iterative 
numerical  solution  schemes  for  partial  differential  equations),  then  pGS  —  .9801,  which  is 
twice  as  fast,  but  still  quite  slow,  while  SOR  with  uj *  =  1.7527  has  p *  —  .7527,  which  is 
dramatically  faster^.  Indeed,  since  p *  ~  ( Pgs)1A  —  (Pj)28,  if  takes  about  14  Gauss-Seidel 
(and  28  Jacobi)  iterations  to  produce  the  same  accuracy  as  one  SOR  step.  It  is  amazing 
that  such  a  simple  idea  can  have  such  a  dramatic  effect. 


Exercises 


O  9.4.24.  Consider  the  linear  system  4u  =  b,  where  A  = 


(a)  What  is  the  solution?  (b)  Discuss  the  convergence  of  the  Jacobi  iteration  method. 

(c)  Discuss  the  convergence  of  the  Gauss-Seidel  iteration  method,  (d)  Write  down  the 
explicit  formulas  for  the  SOR  Method,  (e)  What  is  the  optimal  value  of  the  relaxation 
parameter  uj  for  this  system?  How  much  faster  is  the  convergence  as  compared  to  the 

Jacobi  and  Gauss-Seidel  Methods?  (f)  Suppose  your  initial  guess  is  =  0.  Give  an 
estimate  as  to  how  many  steps  each  iterative  method  (Jacobi,  Gauss-Seidel,  SOR)  would 
require  in  order  to  approximate  the  solution  to  the  system  to  within  5  decimal  places. 

(g)  Verify  your  answer  by  direct  computation. 


4b  9.4.25.  In  Exercise  9.4.18  you  were  asked  to  solve  a  system  by  Gauss-Seidel.  How  much 

faster  can  you  design  an  SOR  scheme  to  converge?  Experiment  with  several  values  of  the 
relaxation  parameter  ce,  and  discuss  what  you  find. 


4b  9.4.26.  Investigate  the  three  basic  iterative  techniques  —  Jacobi,  Gauss-Seidel,  SOR  —  for 
solving  the  linear  system  K* u*  =  f*  for  the  cubical  circuit  in  Example  6.4. 


f  More  precisely,  since  the  SOR  matrix  is  not  necessarily  diagonalizable,  the  overall  convergence 
rate  is  slightly  slower  than  the  spectral  radius.  However,  this  technical  detail  does  not  affect  the 
overall  conclusion. 
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4»  9.4.27.  Consider  the  linear  system 

4x  —  y  —  z  =  1,  —  x -\- Ay  —  w  =  2,  —  x-\-4z  —  re  =  0,  —y  —  z  +  Aw  =  l. 

(a)  Find  the  solution  by  using  Gaussian  Elimination  and  Back  Substitution,  (b)  Using 
0  as  your  initial  guess,  how  many  iterations  are  required  to  approximate  the  solution  to 
within  five  decimal  places  using  (z)  Jacobi  iteration?  (zz)  Gauss-Seidel  iteration?  Can 
you  estimate  the  spectral  radii  of  the  relevant  matrices  in  each  case?  (c)  Try  to  find  the 
solution  by  using  the  SOR  Method  with  the  parameter  uj  taking  various  values  between  .5 
and  1.5.  Which  value  of  u  gives  the  fastest  convergence?  What  is  the  spectral  radius  of  the 
SOR  matrix? 


4b  9.4.28.  (a)  Find  the  spectral  radius  of  the  Jacobi  and  Gauss-Seidel  iteration  matrices  when 
/  2  1  0  0  \ 

12  10 


A  = 


0 

VO 


l 

0 


2 

1 


1 
2  J 


(b)  Is  A  strictly  diagonally  dominant?  (c)  Use  (9.76)  to  fix  the 


optimal  value  of  the  SOR  parameter.  Verify  that  the  spectral  radius  of  the  resulting 
iteration  matrix  agrees  with  the  second  formula  in  (9.76).  (d)  For  each  iterative  method, 

predict  how  many  iterations  are  needed  to  solve  the  linear  system  Ax  =  e1  to  4  decimal 
places,  and  then  verify  your  predictions  by  direct  computation. 


* 


9.4.29.  Change  the  matrix  in  Exercise  9.4.28  to  A 


(2 

-1 

0 

0\ 

1 

2 

-1 

0 

0 

1 

2 

-1 

\o 

0 

1 

2/ 

and  answer  the 


same  questions.  Does  the  SOR  Method  with  parameter  given  by  (9.76)  speed  the  iterations 
up?  Why  not?  Can  you  find  a  value  of  the  SOR  parameter  that  does? 


4b  9.4.30.  Consider  the  linear  system  Au  =  e1  in  which  A  is  the  8x8  tridiagonal  matrix  with 

all  2’s  on  the  main  diagonal  and  all  —  l’s  on  the  sub-  and  super-diagonals,  (a)  Use  Exercise 
8.2.47  to  find  the  spectral  radius  of  the  Jacobi  iteration  method  to  solve  Au  =  b.  Does  the 
Jacobi  Method  converge?  (b)  What  is  the  optimal  value  of  the  SOR  parameter  based  on 
(9.76)?  How  many  Jacobi  iterations  are  needed  to  match  the  effect  of  a  single  SOR  step? 

(c)  Test  out  your  conclusions  by  using  both  Jacobi  and  SOR  to  approximate  the  solution 
to  3  decimal  places. 


4»  9.4.31.  How  much  can  you  speed  up  the  convergence  of  the  iterative  solution  to  the 
pentadiagonal  linear  system  in  Exercise  9.4.22  when  z  =  4  using  SOR?  Discuss. 

0  9.4.32.  For  the  matrix  treated  in  Example  9.40,  prove  that  (a)  as  ca  increases  from  1  to 

8  —  4 a/3,  the  two  eigenvalues  move  towards  each  other,  with  the  larger  one  decreasing  in 


magnitude;  (b)  if  ca  >  8  — the  eigenvalues  are  complex  conjugates,  with  larger  modulus 


than  the  optimal  value. 

(c)  Can  you  conclude  that 

= 

8  —  is  the  optimal  value  for 

the  SOR  parameter? 

/ 

4 

-1 

0 

-1 

0 

0 

0 

0  0\ 

-1 

4 

-1 

0 

-1 

0 

0 

0  0 

0 

-1 

4 

0 

0 

-1 

0 

0  0 

-1 

0 

0 

4 

-1 

0 

-1 

0  0 

33.  The  matrix  A  = 

0 

-1 

0 

-1 

4 

-1 

0 

-1  0 

arises  in  the  finite 

0 

0 

-1 

0 

-1 

4 

0 

0  -1 

0 

0 

0 

-1 

0 

0 

4 

-1  0 

0 

0 

0 

0 

-1 

0 

-1 

4  -1 

V 

0 

0 

0 

0 

0 

-1 

0 

-1  4/ 

difference  (and  finite  element)  discretization  of 

the  Poisson  equation 

on  a  nine  point  square 

grid.  Solve  the  linear  system  Au  =  e5  using  (a)  Gaussian  Elimination;  (b)  Jacobi 
iteration;  (c)  Gauss-Seidel  iteration;  (d)  SOR  based  on  the  Jacobi  spectral  radius. 
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4»  9.4.34.  The  generalization  of  Exercise  9.4.33  to  an  n  x  n  grid  results  in  an  n 

Ik  —  i  \ 

-i  k  —  i 

-I  K  —  I 


o  o 

x  n  matrix  in 


block  tridiagonal  form  A  = 


,  in  which  K  is  the  tridiagonal 


v  ••  •• 

n  x  n  matrix  with  4’s  on  the  main  diagonal  and  —  l’s  on  the  sub-  and  super-diagonals, 
while  I  denotes  the  n  x  n  identity  matrix.  Use  the  known  value  of  the  Jacobi  spectral 

7T 

radius  pj  =  cos  — y-y  ,  [86],  to  design  an  SOR  Method  to  solve  the  linear  system  4u  =  f. 

Run  your  method  on  the  cases  n  =  5  and  f  =  e13  and  n  =  25  and  f  =  e313  corresponding  to 
a  unit  force  at  the  center  of  the  grid.  How  much  faster  is  the  convergence  rate  of  SOR  than 
Jacobi  and  Gauss-Seidel? 

O  9.4.35.  If  u is  an  approximation  to  the  solution  to  Aw  =  b,  then  the  residual  vector 
= b-Au^  measures  how  accurately  the  approximation  solves  the  system. 

(a)  Show  that  the  Jacobi  iteration  can  be  written  in  the  form  =  u ^  +  D~1rlyk\ 

( b )  Show  that  the  Gauss-Seidel  iteration  has  the  form  =  u ^  +  (L  +  D)~1r(yk\ 

(c)  Show  that  the  SOR  iteration  has  the  form  =  u ^  +  (c jL  +  D)~1r(yk\ 


(d)  If 


||  is  small,  does  this  mean  that  u is  close  to  the  solution?  Explain  your  answer 


and  illustrate  with  a  couple  of  examples. 


9.4.36.  Let  K  be  a  positive  definite  n  x  n  matrix  with  eigenvalues  A1  >  A2  >  •  •  •  >  Xn  >  0.  For 

what  values  of  e  does  the  iterative  system  =  u ^  where  r ^  =  f  —  Ku^  is 

the  current  residual  vector,  converge  to  the  solution  to  the  linear  system  K u  =  f?  What  is 
the  optimal  value  of  £,  and  what  is  the  convergence  rate? 


9.5  Numerical  Computation  of  Eigenvalues 

The  importance  of  the  eigenvalues  of  a  square  matrix  in  a  broad  range  of  applications  is 
amply  demonstrated  in  this  chapter  and  its  successor.  However,  finding  the  eigenvalues 
and  associated  eigenvectors  is  not  such  an  easy  task.  The  direct  method  of  constructing  the 
characteristic  equation  of  the  matrix  through  the  determinantal  formula,  then  solving  the 
resulting  polynomial  equation  for  the  eigenvalues,  and  finally  producing  the  eigenvectors 
by  solving  the  associated  homogeneous  linear  system,  is  hopelessly  inefficient,  and  fraught 
with  numerical  pitfalls.  We  are  in  need  of  a  completely  new  idea  if  we  have  any  hopes  of 
designing  efficient  numerical  approximation  schemes. 

In  this  section,  we  develop  a  few  of  the  most  basic  numerical  algorithms  for  computing 
eigenvalues  and  eigenvectors.  All  are  iterative  in  nature.  The  most  direct  are  based  on  the 
connections  between  the  eigenvalues  and  the  high  powers  of  a  matrix.  A  more  sophisticated 
approach,  based  on  the  Q  R  factorization  that  we  learned  in  Section  4.3,  will  be  presented 
at  the  end  of  the  section.  Additional  computational  methods  for  eigenvalues  will  appear 
in  the  following  Section  9.6. 

The  Power  Method 

We  have  already  noted  the  role  played  by  the  eigenvalues  and  eigenvectors  in  the  solution  to 
linear  iterative  systems.  Now  we  are  going  to  turn  the  tables,  and  use  the  iterative  system 
as  a  mechanism  for  approximating  the  eigenvalues,  or,  more  correctly,  selected  eigenvalues 
of  the  coefficient  matrix.  The  simplest  of  the  resulting  computational  procedures  is  known 
as  the  Power  Method. 
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We  assume,  for  simplicity,  that  A  is  a  complete^  nxn  matrix.  Let  v1? . . . ,  vn  denote  its 
eigenvector  basis,  and  X1: . . . ,  Xn  the  corresponding  eigenvalues.  As  we  have  learned,  the 
solution  to  the  linear  iterative  system 

v(fc+i)  =  Aw^k\  v(0)  =  v,  (9.77) 

is  obtained  by  multiplying  the  initial  vector  v  by  the  successive  powers  of  the  coefficient 
matrix:  v^fc)  —  Akw.  If  we  write  the  initial  vector  in  terms  of  the  eigenvector  basis 

V  =  C1V1+  •••  +  C„V„>  (9-78) 

then  the  solution  takes  the  explicit  form  given  in  Theorem  9.4,  namely 

v(fc)  _  Ak  v  =  cx  A^  vy  +  •••  +  cn\kvn.  (9.79) 


Suppose  further  that  A  has  a  single  dominant  real  eigenvalue,  A1?  that  is  larger  than  all 
others  in  magnitude,  so 

I A1  |  >  |  A^-  |  for  all  j  >  1.  (9.80) 

As  its  name  implies,  this  eigenvalue  will  eventually  dominate  the  iteration  (9.79).  Indeed, 
since 


A 


k 


» i  y 


k 


for  all  j  >  1  and  all  k  0, 


the  first  term  in  the  iterative  formula  (9.79)  will  eventually  be  much  larger  than  the  rest, 
and  so,  provided  cx  ^  0, 


c1\kv1  for  k  0. 


Therefore,  the  solution  to  the  iterative  system  (9.77)  will,  almost  always,  end  up  being  a 
multiple  of  the  dominant  eigenvector  of  the  coefficient  matrix. 

To  compute  the  corresponding  eigenvalue,  we  note  that  the  ith  entry  of  the  iterate 
is  approximated  by  v ^  —  c1Xkv1  ^  where  v1  i  is  the  zth  entry  of  the  eigenvector  vx.  Thus, 

as  long  as  vXi  ^  0,  we  can  recover  the  dominant  eigenvalue  by  taking  a  ratio  between 
selected  components  of  successive  iterates: 

v (fc) 

\x  ~  ^-i)  ’  provided  that  v ^  1  ^  0.  (9.81) 


Example  9.41.  Consider  the 
eigenvalues  and  eigenvectors  are 


/ -1  2  2 \ 
matrix  /l  =1—1  —4  —2  J  . 

\-3  9  7/ 


As  you  can  check,  its 


A 


l 


T 

Repeatedly  multiplying  the  initial  vector  v  =  (1,0,0)  by  A  results  in  the  iterates 
v(fc)  —  Akv  listed  in  the  accompanying  table.  The  last  column  indicates  the  ratio 
A (fc)  =  v[k)  /v[k  between  the  hrst  components  of  successive  iterates.  (One  could  equally 


^  This  is  not  a  very  severe  restriction.  Most  matrices  are  complete.  Moreover,  perturbations 
caused  by  round-off  and/or  numerical  inaccuracies  will  almost  invariably  make  an  incomplete 
matrix  complete. 
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k 

v(fe) 

0 

1 

0 

0 

1 

-1 

-1 

-3 

-1. 

2 

-7 

11 

-27 

7. 

3 

-25 

17 

-69 

3.5714 

4 

-79 

95 

-255 

3.1600 

5 

-241 

209 

-693 

3.0506 

6 

-727 

791 

-2247 

3.0166 

7 

-2185 

2057 

-6429 

3.0055 

8 

-6559 

6815 

-19935 

3.0018 

9 

-19681 

19169 

-58533 

3.0006 

10 

-59047 

60071 

-178167 

3.0002 

11 

-177145 

175097 

-529389 

3.0001 

12 

-531439 

535535 

-1598415 

3.0000 

well  use  the  second  or  third  components.)  The  ratios  are  converging  to  the  dominant 
eigenvalue  =  3,  while  the  vectors  are  converging  to  a  very  large  multiple  of  the 

corresponding  eigenvector  v1  =  (l,— 1,3)  . 


The  success  of  the  Power  Method  lies  in  the  assumption  that  A  has  a  unique  dominant 
eigenvalue  of  maximal  modulus,  which,  by  definition,  equals  its  spectral  radius:  \  X1  = 
p(A).  The  rate  of  convergence  of  the  method  is  governed  by  the  ratio  |  A2/A1  |  between 
the  subdominant  and  dominant  eigenvalues.  Thus,  the  farther  the  dominant  eigenvalue 
lies  away  from  the  rest,  the  faster  the  Power  Method  converges.  We  also  assumed  that  the 
initial  vector  includes  a  nonzero  multiple  of  the  dominant  eigenvector,  i.e.,  c1  ^  0.  As 
we  do  not  know  the  eigenvectors,  it  is  not  so  easy  to  guarantee  this  in  advance,  although 
one  must  be  quite  unlucky  to  make  such  a  poor  choice  of  initial  vector.  (Of  course,  the 
stupid  choice  v^0^1  =  0  is  not  counted.)  Moreover,  even  if  c1  happens  to  be  0  initially, 
numerical  round-off  error  will  typically  come  to  one’s  rescue,  since  it  will  almost  inevitably 
introduce  a  tiny  component  of  the  eigenvector  v1  into  some  iterate,  and  this  component 
will  eventually  dominate  the  computation.  The  trick  is  to  wait  long  enough  for  it  to  appear! 

Since  the  iterates  of  A  are,  typically,  getting  either  very  large  —  when  p(A)  >  1 
or  very  small  —  when  p(A)  <1  —  the  iterated  vectors  will  be  increasingly  subject  to 
numerical  overflow  or  underflow,  and  the  method  may  break  down  before  a  reasonable 
approximation  is  achieved.  One  way  to  avoid  this  outcome  is  to  restrict  our  attention 
to  unit  vectors  relative  to  a  given  norm,  e.g.,  the  Euclidean  norm  or  the  oo  norm,  since 
their  entries  cannot  be  too  large,  and  so  are  less  likely  to  cause  numerical  errors  in  the 
computations.  As  usual,  the  unit  vector  u ^  —  ||vA)  ||_1  is  obtained  by  dividing  the 
iterate  by  its  norm;  it  can  be  computed  directly  by  the  modified  iterative  algorithm 


u<°>  = 


.(0) 


.(0) 


and 


u 


(fc+i)  _ 


A 


(9.82) 


If  the  dominant  eigenvalue  is  positive,  A:  >  0,  then  u -4-  u±  will  converge  to  one  of  the 
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k 

u(fe) 

A 

0 

1 

0 

0 

1 

-.3015 

-.3015 

-.9045 

-1.0000 

2 

-.2335 

.3669 

-.9005 

7.0000 

3 

-.3319 

.2257 

-.9159 

3.5714 

4 

-.2788 

.3353 

-.8999 

3.1600 

5 

-.3159 

.2740 

-.9084 

3.0506 

6 

-.2919 

.3176 

-.9022 

3.0166 

7 

-.3080 

.2899 

-.9061 

3.0055 

8 

-.2973 

.3089 

-.9035 

3.0018 

9 

-.3044 

.2965 

-.9052 

3.0006 

10 

-.2996 

.3048 

-.9041 

3.0002 

11 

-.3028 

.2993 

-.9048 

3.0001 

12 

-.3007 

.3030 

-.9043 

3.0000 

two  dominant  unit  eigenvectors  (the  other  is  —  u±).  If  <  0,  then  the  iterates  will  switch 
back  and  forth  between  the  two  eigenvectors,  so  u ^  ~  Trq.  In  either  case,  the  dominant 
eigenvalue  A:  is  obtained  as  a  limiting  ratio  between  nonzero  entries  of  Au^  and  u^k\ 
If  some  other  sort  of  behavior  is  observed,  it  means  that  one  of  our  assumptions  is  not 
valid;  either  A  has  more  than  one  dominant  eigenvalue  of  maximum  modulus,  e.g.,  it  has 
a  complex  conjugate  pair  of  eigenvalues  of  largest  modulus,  or  it  is  not  complete.  In  such 
cases,  one  can  apply  the  more  general  long  term  behavior  described  in  Exercise  9.2.8  to 
pin  down  the  dominant  eigenvalues. 

Example  9.42.  For  the  matrix  considered  in  Example  9.41,  starting  the  iterative  sys¬ 
tem  (9.82)  with  u =  (1,0,0)  ,  the  resulting  unit  vectors  are  tabulated  above.  The 

last  column,  being  the  ratio  between  the  first  components  of  and  again 

converges  to  the  dominant  eigenvalue  A:  =  3. 

Variants  of  the  Power  Method  for  computing  the  other  eigenvalues  of  the  matrix  are 
explored  in  the  exercises. 

Remark.  See  Wilkinson,  [90;  Chapter  2]  for  the  perturbation  theory  of  eigenvalues,  i.e., 
how  they  can  behave  under  small  perturbations  of  the  matrix.  Wilkinson  defines  a  spectral 
condition  number  to  equal  the  product  of  the  norms  of  the  matrix  used  to  place  the 
matrix  in  Jordan  canonical  form  and  its  inverse.  The  larger  the  spectral  condition  number, 
the  more  the  eigenvalues  deviate  under  perturbation.  In  particular  symmetric  matrices 
have  spectral  condition  number  =  1,  and  so  their  eigenvalues  are  well  behaved  under 
perturbations.  He  also  gives  examples  of  highly  ill-conditioned  matrices.  Similarly,  in 
[69;  Section  3.3],  Saad  defines  a  condition  number  for  an  individual  simple  eigenvalue,  and 
proves  that  it  is  the  reciprocal  of  the  cosine  of  the  angle  between  its  eigenvectors  and 
co-eigenvectors  (left  eigenvectors). 
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Exercises 


9.5.1.  Use  the  Power  Method  to  find  the  dominant  eigenvalue  and  associated  eigenvector  of  the 
following  matrices: 


(a) 


1 

3 


( b ) 


(c) 


(e) 


(0 


(2 

1 

\2 


2 

3 

2 


(g) 


/ 

3 

-1 

0\ 

/  —2 

0 

1\ 

-1 

2 

-1 

>  ( d ) 

-3 

-2 

° 

V 

0 

-1 

3  J 

V-2 

5 

4  J 

/ 

2 

-1 

0 

0\ 

(  4 

1 

0 

1\ 

-1 

2 

-1 

0 

( h ) 

1 

4 

1 

0 

0 

-1 

2 

-1 

5 

0 

1 

4 

1 

V 

0 

0 

-1 

2  / 

U 

0 

1 

4/ 

9.5.2.  Use  the  Power  Method  to  find  the  largest  singular  value  of  the  following  matrices: 

3  1  - 


(a) 


1 

1 


(b) 


2 

2 


2 

2 


(d) 


1  -2 
\  2  -1 


4b  9.5.3.  Let  Tn  be  the  tridiagonal  matrix  whose  diagonal  entries  are  all  equal  to  2  and  whose 
sub-  and  super-diagonal  entries  all  equal  1.  Use  the  Power  Method  to  find  the  dominant 
eigenvalue  of  Tn  for  n  =  10,  20,  50.  Do  your  values  agree  with  those  in  Exercise  8.2.47?  How 
many  iterations  do  you  require  to  obtain  4  decimal  place  accuracy? 


0  9.5.4.  Prove  that,  for  the  iterative  method  (9.82),  ||  4u^ 
explain  how  to  deduce  its  sign. 


A 


Assuming  X1  is  real, 


0  9.5.5.  The  Inverse  Power  Method.  Let  A  be  a  nonsingular  matrix,  (a)  Show  that  the 

eigenvalues  of  A-1  are  the  reciprocals  1/A  of  the  eigenvalues  of  A.  How  are  the  eigenvectors 

related?  (b)  Show  how  to  use  the  Power  Method  on  A-1  to  produce  the  smallest 
(in  modulus)  eigenvalue  of  A.  (c)  What  is  the  rate  of  convergence  of  the  algorithm? 

(d)  Design  a  practical  iterative  algorithm  based  on  the  (permuted)  LU  decomposition  of  A. 

4b  9.5.6.  Apply  the  Inverse  Power  Method  of  Exercise  9.5.7  to  the  find  the  smallest  eigenvalue  of 
the  matrices  in  Exercise  9.5.1. 


0  9.5.7.  The  Shifted  Inverse  Power  Method.  Suppose  that  fi  is  not  an  eigenvalue  of  A. 

(a)  Show  that  the  iterative  system  =  (A  —  fi  I)_1u^)  converges  to  the  eigenvector 

of  A  corresponding  to  the  eigenvalue  A*  that  is  closest  to  fi.  Explain  how  to  find  the 

eigenvalue  A*,  (b)  What  is  the  rate  of  convergence  of  the  algorithm?  (c)  What  happens  if 
fi  is  an  eigenvalue? 

4b  9.5.8.  Apply  the  Shifted  Inverse  Power  Method  of  Exercise  9.5.7  to  the  find  the  eigenvalue 
closest  to  ii  =  .5  of  the  matrices  in  Exercise  9.5.1. 

9.5.9.  Suppose  that  Au^  =  0  in  the  iterative  procedure  (9.82).  What  does  this  indicate? 

4b  9.5.10.  (i)  Explain  how  to  use  the  Deflation  Method  of  Exercise  8.2.51  to  find  the 

subdominant  eigenvalue  of  a  nonsingular  matrix  A.  (ii)  Apply  your  method  to  the 
matrices  listed  in  Exercise  9.5.1. 


The  QR  Algorithm 

As  stated,  the  Power  Method  produces  only  the  dominant  (largest  in  magnitude)  eigenvalue 
of  a  matrix  A.  The  Inverse  Power  Method  of  Exercise  9.5.5  can  be  used  to  find  the  smallest 
eigenvalue.  Additional  eigenvalues  can  be  found  by  using  the  Shifted  Inverse  Power  Method 
of  Exercise  9.5.7,  or  the  Deflation  Method  of  Exercise  9.5.10.  However,  if  we  need  to  know 
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all  the  eigenvalues,  such  piecemeal  methods  are  too  time-consuming  to  be  of  much  practical 
value. 

The  most  popular  scheme  for  simultaneously  approximating  all  the  eigenvalues  of  a 
matrix  A  is  the  remarkable  QR  algorithm,  first  proposed  in  1961  by  John  Francis,  [29], 
and  Vera  Kublanovskaya,  [51].  The  underlying  idea  is  simple,  but  surprising.  The  first 
step  is  to  factor  the  matrix 

A  =  Aq  =  Q()  R() 

into  a  product  of  an  orthogonal  matrix  Q0  and  a  positive  (i.e.,  with  all  positive  entries  along 
the  diagonal)  upper  triangular  matrix  R0  by  using  the  Gram-Schmidt  orthogonalization 
procedure  of  Theorem  4.24,  or,  even  better,  the  numerically  stable  version  described  in 
(4.28).  Next,  multiply  the  two  factors  together  in  the  wrong  orderl  The  result  is  the  new 
matrix 

~  Qo- 

We  then  repeat  these  two  steps.  Thus,  we  next  factor 


Ai  —  Q  i  R\ 

using  the  Gram-Schmidt  process,  and  then  multiply  the  factors  in  the  reverse  order  to 
produce 

A2  —  R1 Q1. 

The  complete  algorithm  can  be  written  as 

A  =  A0  =  Q0R0:  ~  RkQk  ~  Qk+i  ^  —  0,1,2,...,  (9.83) 


where  Qk,Rk  come  from  the  previous  step,  and  the  subsequent  orthogonal  matrix  Qfc+1 
and  positive  upper  triangular  matrix  Rk+i  are  computed  directly  from  Ak+1  =  RkQk  by 
applying  the  numerically  stable  form  of  the  Gram-Schmidt  algorithm. 

The  astonishing  fact  is  that,  for  many  matrices  A  with  all  real  eigenvalues,  the  iterates 
Ak  — V  converge  to  an  upper  triangular  matrix  V  whose  diagonal  entries  are  the  eigenval¬ 
ues  of  A.  Thus,  after  a  sufficient  number  of  iterations,  say  m,  the  matrix  Arn  will  have  very 
small  entries  below  the  diagonal,  and  one  can  read  off  a  complete  system  of  (approximate) 
eigenvalues  along  its  diagonal.  For  each  eigenvalue,  the  computation  of  the  corresponding 
eigenvector  can  be  most  efficiently  accomplished  by  applying  the  Shifted  Inverse  Power 
Method  of  Exercise  9.5.7  with  parameter  /r  chosen  near  the  computed  eigenvalue. 

Consider  the  matrix  A  = 
yields 

_  /  .7071  —.7071  \  _  / 2.8284  2.8284\ 

“  Y  .7071  .7071 )  7  ^  o  1.4142 )  ' 

These  are  multiplied  in  the  reverse  order  to  give 

A1  =  i?0Q0=(^  J). 


Example  9.43. 

ization  A  —  Q0R0 

Qo 


l 


The  initial  Gram-Schmidt  factor- 


We  refactor  A±  =  Q±R±  via  Gram-Schmidt,  and  then  reverse  multiply  to  produce 


/  .9701  —.2425  \ 

y .2425  .9701 J  ’ 


(  4.1231 

V  o 


.2425  \ 
.9701 )  ’ 
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The  next  iteration  yields 


Qi  - 


.9983  -.0579 
.0579  .9983 

=  i?2  Q2  — 


R‘2  - 


4.0656 

0 


-.7090  \ 
.9839  )  ’ 


4.0178 

.0569 


.9431  \ 
.9822  ) 


Continuing  in  this  manner,  after  9  iterations  we  obtain,  to  four  decimal  places, 


Qg  ~ 


1  0 
0  1 


R9  ~ 


4  -1 

0  1 


^10  ~  R9Q9  — 


9  W9 


4  -1 

0  1 


The  eigenvalues  of  A ,  namely  4  and  1,  appear  along  the  diagonal  of  A10.  Additional 
iterations  produce  very  little  further  change,  although  they  can  be  used  for  increasing  the 
numerical  accuracy  of  the  computed  eigenvalues. 

If  the  original  matrix  A  happens  to  be  symmetric  and  positive  definite,  then  the  limiting 
matrix  Ak  — >  V  =  A  is,  in  fact,  the  diagonal  matrix  containing  the  eigenvalues  of  A. 
Moreover,  if,  in  this  case,  we  recursively  define 

$k  =  &k-l  Qk  =  Qo  Ql  ■■■  Qk-lQk’  (9.84) 

A,  an  orthogonal  matrix,  whose  columns  are  the 


which  then  have,  as  their  limit,  Sk  - 
orthonormal  eigenvector  basis  of  A. 

/ 2  1  0 

Example  9.44.  Consider  the  symmetric  matrix  A  =  1  3  —  1 

A  =  QnRn  factorization  produces  \0  —  1  6 


So  —  Qo  — 


and  so 


.8944  -.4082  -.1826\ 

.4472  .8165  .3651  , 

0  -.4082  .9129 ) 

3.0000 

A1  =  R0Q0  ~  (  1.0954 

0 


/  ?  rs_/ 
n  ()  ~ 


1.0954 

3.3333 

2.0870 


2.2361  2.2361 
0  2.4495 

0  0 

0 

-2.0870 
4.6667 


The  initial 


-  .4472  \ 
-3.2660  , 

5.1121 ) 


We  refactor  A1  —  Q1R1  and  reverse  multiply  to  produce 


Qi  - 


Ri 


.9393  -.2734 
.3430  .7488 

0  -.6038 

3.1937  2.1723 

0  3.4565 

0  0 


^  1  —  RqQi  — 


-  .7158  \ 
-4.3804  , 

2.5364  ) 


A2  —  R\  Qi  — 


.7001 

.7001 

-.1400 

3.7451 

1.1856 

0 


.4400 

.2686 

.8569 

1.1856 

5.2330 

1.5314 


.5623  \ 
.6615  , 

.4962  / 


Continuing  in  this  manner,  after  10  iterations  we  have 


Q 


10 


riio  — 


1.0000  -  .0067  0 

.0067  1.0000  .0001 

0  -.0001  1.0000 

6.3229  .0647  0 

0  3.3582  -.0006 

0  0  1.3187 


c  ~ 
°10  — 


A  rsj 
/in  — 


.0753 

-.5667 

-.8205  \ 

.3128 

-.7679 

.5591 

-.9468 

-.2987 

.1194  ) 

6.3232 

.0224 

°\ 

.0224 

3.3581 

-.0002  . 

0 

-.0002 

1.3187  / 
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After  20  iterations,  the  process  has  completely  settled  down,  and 

1  0  0\  /  .0710  -.5672 

Q20  ~  (  0  1  0  ,  .S'2o  ~  (  .3069  -.7702 

0  0  1/  V  -.9491  -.2915 


-.8205  \ 
.5590  , 

.1194  ) 


^20  — 


6.3234  .0001  0  \  /  6.3234  0  0 

0  3.3579  0  ,  A21~  l  0  3.3579  0 

0  0  1.3187/  \  0  0  1.3187 

The  eigenvalues  of  A  appear  along  the  diagonal  of  A21,  while  the  columns  of  S20  are  the 
corresponding  orthonormal  eigenvector  basis,  listed  in  the  same  order  as  the  eigenvalues, 
both  correct  to  4  decimal  places. 

We  will  devote  the  remainder  of  this  section  to  a  justification  of  the  Q  R  algorithm  for 
a  class  of  matrices.  We  will  assume  that  A  is  symmetric,  and  that  its  (necessarily  real) 
eigenvalues  satisfy 


Ai  >  Ao  > 


>  I  A_  I  >  0. 


(9.85) 

According  to  the  Spectral  Theorem  8.38,  the  corresponding  unit  eigenvectors  u1? . . . ,  un 
(in  the  Euclidean  norm)  form  an  orthonormal  basis  of  Mn.  Our  analysis  can  be  adapted  to 
a  broader  class  of  matrices,  but  this  will  suffice  to  expose  the  main  ideas  without  unduly 
complicating  the  exposition. 

The  secret  is  that  the  Q  R  algorithm  is,  in  fact,  a  well-disguised  adaptation  of  the  more 
primitive  Power  Method.  If  we  were  to  use  the  Power  Method  to  capture  all  the  eigenvectors 
and  eigenvalues  of  A,  the  first  thought  might  be  to  try  to  perform  it  simultaneously  on 
a  complete  basis  v[°\  . . . ,  v/J^  of  Mn  instead  of  just  one  individual  vector.  The  problem 
is  that,  for  almost  all  vectors,  the  power  iterates  =  Akv^  all  tend  to  a  multiple  of 
the  dominant  eigenvector  u:.  Normalizing  the  vectors  at  each  step,  as  in  (9.82),  is  not 
any  better,  since  then  they  merely  converge  to  one  of  the  two  dominant  unit  eigenvectors 
±u1.  However,  if,  inspired  by  the  form  of  the  eigenvector  basis,  we  orthonormalize  the 
vectors  at  each  step,  then  we  effectively  prevent  them  from  all  accumulating  at  the  same 
dominant  unit  eigenvector,  and  so,  with  some  luck,  the  resulting  vectors  will  converge  to  the 
full  system  of  eigenvectors.  Since  orthonormalizing  a  basis  via  the  Gram-Schmidt  process 
is  equivalent  to  a  QR  matrix  factorization,  the  mechanics  of  the  algorithm  becomes  less 
surprising. 

In  detail,  we  start  with  any  orthonormal  basis,  which,  for  simplicity,  we  take  to  be  the 
standard  basis  vectors  of  and  so  =  e1?  ...  ,  u®  =  en.  At  the  kth  stage  of  the 

algorithm,  we  set  . . . ,  to  be  the  orthonormal  vectors  that  result  from  applying 
the  Gram-Schmidt  algorithm  to  the  power  vectors  =  Ak  e-.  In  matrix  language,  the 


vectors  v 


(k) 

1 


-O) 


3 


v^'  are  merely  the  columns  of  Afc,  and  the  orthonormal  basis  u\ 


(k) 


u 


(k) 


77- 


are  the  columns  of  the  orthogonal  matrix  Sk  in  the  Q  R  decomposition  of  the  kth  power  of 
A,  which  we  denote  by 

Ak  —  Sk  Pk,  (9.86) 

where  Pk  is  positive  upper  triangular,  meaning  all  its  diagonal  entries  are  positive.  Note 
that,  in  view  of  (9.83) 

A  —  QqRq,  A  —  Q0  R0  Q0  R0  —  QqQ  1 R1  Rq: 

A  =  Qq  Rq  Qq  Rq  Qq  Rq  =  QqQ  1  R\  Qi  R\  Rq  —  Qo  Qi  Q2  ^2 

and,  in  general, 

Ak  =  (Q0Q1  ■  ■  ■  Qk_x )  ( -Rfe— i  •••  RiRo)-  (9-87) 
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Proposition  4.23  tells  us  that  the  product  of  orthogonal  matrices  is  also  orthogonal.  The 
product  of  positive  upper  triangular  matrices  is  also  positive  upper  triangular.  Therefore, 
comparing  (9.86,87)  and  invoking  the  uniqueness  of  the  QR  factorization,  we  conclude 
that 


$k  —  QoQl  '  '  '  Qk- 1  —  &k-l  Qk- 1? 


~  Rk- 1  '  '  *  ^1  ^0  —  Rk-1  F*k- 1  *  (9.88) 


Let  S  =  ( ux  u2  . . .  un  )  denote  an  orthogonal  matrix  whose  columns  are  unit  eigenvec¬ 
tors  of  A.  The  Spectral  Theorem  8.38  tells  us  that 

A  =  SAST ,  where  A  =  diag  (A1? . . . ,  An) 

is  the  diagonal  eigenvalue  matrix.  Substituting  the  spectral  factorization  into  (9.86)  yields 

Ak  =  SAk  gT  =  g  p 


We  now  make  one  additional  assumption  on  the  matrix  A  by  requiring  that  ST  be  a 
regular  matrix,  meaning  that  it  can  be  factored,  ST  =  LU,  as  the  product  of  a  lower 
unitriangular  matrix  and  an  upper  triangular  matrix.  We  can  further  assume,  without  loss 
of  generality,  that  the  diagonal  entries  of  U  —  that  is,  the  pivots  of  ST  —  are  all  positive. 
Indeed,  by  Exercise  1.3.31,  this  can  be  arranged  by  multiplying  each  row  of  ST  by  the  sign 
of  its  pivot,  which  amounts  to  possibly  replacing  some  of  the  unit  eigenvectors  by  their 
negatives  —  u-,  which  is  allowed,  since  it  does  not  affect  their  status  as  an  orthonormal 
eigenvector  basis.  Regularity  of  ST  holds  generically,  and  is  the  analogue  of  the  condition 
that  our  initial  vector  in  the  Power  Method  includes  a  nonzero  component  of  the  dominant 
eigenvector. 

Under  these  two  assumptions, 

Ak  —  S  Ak  LU  —  SkPk,  and  hence  SAk  L  —  SkPk  U~l . 

Multiplying  on  the  right  by  A~k,  we  obtain 

SAkLA~k  =  SkTk,  where  Tk  =  PkU~1  A~k  (9.89) 


is  also  a  positive  upper  triangular  matrix,  since  Pk:  [/,  A  are  all  of  that  form. 

Now  consider  what  happens  as  k  — ^  oo.  The  entries  of  the  lower  triangular  matrix 
N  =  Ak  L  A~k  are 

'  hMJ W  i>3, 

nij  j  hi  =  ^  ~  h 

{  0,  i  <  j. 


Since  we  are  assuming 


< 


when  i  >  j,  we  immediately  deduce  that 


Ak  L  A  k  — >  I,  and  hence  SkTk  =  SAkLA  k  — >  S  as  k — >  oo. 


We  now  appeal  to  the  following  lemma,  whose  proof  will  be  given  after  we  finish  the 
justification  of  the  QR  algorithm. 


Lemma  9.45.  Let  51?  *S2, . . .  and  S  be  orthogonal  matrices  and  Tt,T2, . . .  positive  upper 
triangular  matrices.  Then  Sk  Tk  S  as  k  — oo  if  and  only  if  Sk  — S  and  Tk  — I . 


Lemma  9.45  implies  that,  as  claimed,  the  orthogonal  matrices  Sk  do  converge  to  the 
orthogonal  eigenvector  matrix  S.  Moreover,  by  (9.88-89), 


Rk  =  Pk Pk- 1  =  ( Tk  Ak  U~l )  ( Tk_,  A*-1  U~l)~l=Tk  A  T~\ . 
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Since  both  Tk  and  Tk_x  converge  to  the  identity  matrix,  Rk  converges  to  the  diagonal 
eigenvalue  matrix  A,  as  claimed.  The  eigenvalues  appear  in  decreasing  order  along  the 
diagonal  —  this  is  a  consequence  of  our  regularity  assumption  on  the  transposed  eigenvector 
matrix  ST . 

Theorem  9.46.  If  A  is  positive  definite  with  all  simple  eigenvalues,  and  its  transposed 
eigenvector  matrix  ST  is  regular,  then  the  matrices  Sk  S  and  Rk  A  appearing  in  the 
QR  algorithm  applied  to  A  converge  to,  respectively,  the  eigenvector  matrix  S  and  the 
diagonal  eigenvalue  matrix  A. 


Remark.  If  A  is  symmetric  and  has  all  simple  eigenvalues,  then,  for  suitably  large  a  0, 
the  shifted  matrix  A  =  A  +  a  I  is  positive  definite,  has  the  same  eigenvectors  as  A,  and  has 
simple  shifted  eigenvalues  Xk  =  Xk  +  a.  Thus,  one  can  run  the  QR  algorithm  to  determine 

the  eigenvalues  and  eigenvectors  of  A,  and  hence  those  of  A  by  undoing  the  shift. 

The  last  remaining  item  is  a  proof  of  Lemma  9.45.  We  write 

s  =  Oi  u2  ...  un),  Sk  =  (Afe)  u(2k)  ...  ujf)), 


in  columnar  form.  Let  t[^  denote  the  entries  of  the  positive  upper  triangular  matrix  Tk. 

The  last  column  of  the  limiting  equation  Sk  Tk  — >•  S  reads  t^n  un.  Since  both 

and  un  are  unit  vectors,  and  t[^  >  0,  it  follows  that 


+(k)  u(k) 
nn  n 


f(k) 

nn 


1,  and  hence  the  last  column 


The  next  to  last  column  reads 


Ak) 

Jn  —  l.n  —  1 


U 


(k) 
n  —  1 


T  t 


(k) 

n  —  l,n 


>  Un-1’ 


Taking  the  inner  product  with  — >>  un  and  using  orthonormality,  we  deduce  t^li  n  0, 
and  so  t^-i  n_1  un_1?  which,  by  the  previous  reasoning,  implies  t^\  n_1  1  and 

— >>  un_1.  The  proof  is  completed  by  working  backwards  through  the  remaining 
columns,  using  a  similar  argument  at  each  step.  The  remaining  details  are  left  to  the 
interested  reader. 


Exercises 


9.5.11.  Apply  the  QR  algorithm  to  the  following  symmetric  matrices  to  find  their  eigenvalues 

(2  1  0 


and  eigenvectors  to  2  decimal  places:  (a) 


1  2 
2  6 


( b ) 


3 

-1 


1 

5 


(c) 


12  3 
VO  3  1 


0 d ) 


(2 

5 

Vo 


5  0\ 

0  -3 

3  3 ) 


(e) 


/ 

3 

-1 

0 

0\ 

( 

6 

1 

-1 

°\ 

-1 

3 

-1 

0 

.  (f) 

1 

8 

1 

-i 

0 

-1 

3 

-1 

-1 

1 

4 

i 

V 

0 

0 

-1 

3  ) 

V 

0 

-1 

1 

3/ 

9.5.12.  Show  that  applying  the  QR  algorithm  to  the  matrix  A  = 

diagonal  matrix  with  the  eigenvalues  on  the  diagonal,  but  not  in  decreasing  order.  Explain. 


4 

-1 

i\ 

-1 

7 

2 

results  in  a 

1 

2 

V 
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9.5.13.  Apply  the  QR  algorithm  to  the  following  non-symmetric  matrices  to  find  their 
eigenvalues  to  3  decimal  places: 

i  1  7  9  \ 


(a) 


-1  -2 
3  4 


>  (b) 


2  3 
1  5 


\ 

/  2 

5 

i\ 

»  (d) 

2 

-1 

3 

,  (e) 

/ 

\4 

5 

3  J 

6  8  14  9 
3  14  6 
3  2  5 


9.5.14.  The  matrix  A  = 


3 


(2  1  0 

(c)  2  0-3 

\0  -2  1 

f-1  2  1\ 

—2  3  1  has  a  double  eigenvalue  of  1,  and  so  our  proof  of 

\  —2  2  2 ) 

convergence  of  the  Q  R  algorithm  doesn’t  apply.  Does  the  Q  R  algorithm  find  its  eigenvalues? 


9.5.15.  Explain  why  the  QR  algorithm  fails  to  find  the  eigenvalues  of  the  matrices 


\ 

(  —2 

1 

°\ 

( 

5 

-4 

2  \ 

>  (b) 

0 

-2 

1  ’ 

(c) 

-4 

5 

2 

V  1 

0 

—2/ 

\ 

2 

2 

-1/ 

0  9.5.16.  Prove  that  all  of  the  matrices  Ak  defined  in  (9.83)  have  the  same  eigenvalues. 

0  9.5.17.  (a)  Prove  that  if  A  is  symmetric  and  tridiagonal,  then  all  matrices  Ak  appearing  in  the 
QR  algorithm  are  also  symmetric  and  tridiagonal.  Hint :  First  prove  symmetry. 

(b)  Is  the  result  true  if  A  is  not  symmetric  —  only  tridiagonal? 


Tridiagonalization 

In  practical  implementations,  the  direct  QR  algorithm  often  takes  overly  long  before  pro¬ 
viding  reasonable  approximations  to  the  eigenvalues  of  large  matrices.  Fortunately,  the 
algorithm  can  be  made  much  more  efficient  by  a  simple  preprocessing  step.  The  key  ob¬ 
servation  is  that  the  QR  algorithm  preserves  the  class  of  symmetric  tridiagonal  matrices, 
and,  like  Gaussian  Elimination,  is  much  faster  when  applied  to  this  class.  Moreover,  by 
applying  a  sequence  of  Householder  reflection  matrices  (4.35),  we  can  convert  any  symmet¬ 
ric  matrix  into  tridiagonal  form  while  preserving  all  the  eigenvalues.  Thus,  by  first  using 
the  Householder  tridiagonalization  process,  and  then  applying  the  QR  Method  to  the  re¬ 
sulting  tridiagonal  matrix,  we  obtain  an  efficient  and  practical  algorithm  for  computing 
eigenvalues  of  large  symmetric  matrices.  Generalizations  to  non-symmetric  matrices  will 
be  briefly  considered  at  the  end  of  the  section. 

In  Householder’s  approach  to  the  QR  factorization,  we  were  able  to  convert  the  matrix  A 
to  upper  triangular  form  R  by  a  sequence  of  elementary  reflection  matrices.  Unfortunately, 
this  procedure  does  not  preserve  the  eigenvalues  of  the  matrix  —  the  diagonal  entries  of 
R  are  not  the  eigenvalues  —  and  so  we  need  to  be  a  bit  more  clever  here.  We  begin  by 
recalling,  from  Exercise  8.2.32,  that  similar  matrices  have  the  same  eigenvalues  (but  not 
the  same  eigenvectors). 

Lemma  9.47.  If  H  =  I  —  2uuT  is  an  elementary  reflection  matrix,  with  u  E  Mn  a  unit 
vector  (under  the  Euclidean  norm),  then  A  and  B  —  HAH  are  similar  matrices  and  hence 
have  the  same  eigenvalues. 

Proof :  It  suffices  to  note  that,  according  to  (4.37),  H~x  =  if,  and  hence  B  =  H~x AH  is 
similar  to  A.  Q.E.D. 

Now,  starting  with  a  symmetric  nxn  matrix  A ,  our  goal  is  to  devise  a  similar  tridiagonal 
matrix  by  applying  a  sequence  of  Householder  reflections.  Using  the  Euclidean  norm,  we 


9.5  Numerical  Computation  of  Eigenvalues 


533 


begin  by  setting 


0  \ 


xi  = 


a 

a 


21 

31 


yi  = 


V  anl ) 


0  \ 
0 


Vo/ 


where 


ri  = 


x- 


Yi 


so  that  x  ,  contains  all  the  off-diagonal  entries  of  the  first  column  of  A.  Let 


Hi=l 


'T' 

2u1u1  , 


where 


x 


ui  = 


yi 


xi  -  yi 


be  the  corresponding  elementary  reflection  matrix  that  maps  x:  to  yy.  Either  the  plus  or 
the  minus  sign  in  the  formula  for  y1  works  in  the  algorithm;  a  good  choice  is  to  set  it  to  be 
the  opposite  of  the  sign  of  the  entry  a21,  which  helps  minimize  the  possible  effects  of  round¬ 
off  error  in  computing  the  unit  vector  u1.  By  direct  computation,  based  on  Lemma  4.28 
and  the  fact  that  the  first  entry  of  u:  is  zero,  we  obtain 

/  a 


ii 


a2  =  hxaux  = 


r  i 
0 


V  0 


A 

0 

•••  \ 

a22 

a23 

. . .  a2n 

a32 

a33 

. . .  a3n 

£9 

3 

to 

CO 

a  ) 

.  .  .  ^rin 

(9.90) 


for  certain  a-  -,  whose  explicit  formulae  are  not  needed.  Thus,  by  a  single  Householder 
transformation,  we  convert  A  into  a  similar  matrix  A2  whose  first  row  and  column  are  in 
tridiagonal  form.  We  repeat  the  process  on  the  lower  right  (n  —  1)  x  (n  —  1)  submatrix  of 


.4  2 .  We  set 


x2  = 


f  0  \ 

(  0  \ 

0 

0 

CO 

to 

H- 

to 

•  ^ 

to 

»  yi  = 

. .  o 

i 

\  an2  / 

0  / 

where 


r2  = 


x. 


y2 


and  the  ±  sign  is  chosen  to  be  the  opposite  of  that  of  a32.  Setting 

x2  -y2 


H2=  I  -  2u24, 


where 


Uo  = 


x. 


y2 


we  construct  the  similar  matrix 


—  H2A2  H2 


(  an 

ri 

0 

0 


V  o 


r 


0 


a 


22 


r< 


0 

0 


r< 


0 


0 


a 

a 


33 

43 


a 

a 


34 

44 


a„ 


Jn3  O' n4 

whose  hrst  two  rows  and  columns  are  now  in  tridiagonal  form.  The  remaining  steps  in  the 
algorithm  should  now  be  clear.  Thus,  the  final  result  is  a  tridiagonal  matrix  T  —  An  that 
has  the  same  eigenvalues  (but  not  the  same  eigenvectors)  as  the  original  symmetric  matrix 
A.  Let  us  illustrate  the  method  by  an  example. 


a 

a 


0 
0 

3  n 

4  n 


\ 


a 


nn 


/ 
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Example  9.48.  To  tridiagonalize  A  = 


column.  We  set  x1  — 


,  so  that  y1  = 


/  °\ 

i 

-i 

\  2/ 

vector  and  corresponding  Householder  matrix  are 


4  1 

1  4 
1  1 

2  -1 

/  °  \ 

\/6 

0 

V  0  / 


1  2\ 

i  -i 

4  1 

1  4/ 

/  0 

2.4495 

0 

Vo/ 


,  we  begin  with  its  first 


\ 


Therefore,  the  unit 


0 


ui  = 


xi  -yi 

.8391 

xi  ~Yi 

-.2433 

H1  =  I  —  2  ux  = 


z1 

o 

0 

Vo 


o 

.4082 

.4082 

.8165 


0 

.4082 

.8816 

.2367 


0  \ 

.8165 

,2367 

,5266/ 


We  compute 


A,  = 


In  the  next  phase,  x2  = 


/ 

4.0000 

-2.4495 

0 

0  \ 

_  U  A  U  _ 

-2.4495 

2.3333 

-.3865 

-.8599 

—  1 1  1  /i  1 1  1  — 

0 

-.3865 

4.9440 

-.1246 

• 

0 

-.8599 

-.1246 

4.7227/ 

/  0 

\ 

0  \ 

/  0  \ 

0 

0 

0 

x2  = 

-.3865 

to 

-.9428 

,  so  u2 

— 

-.8396 

V  —.8599  / 

Vo/ 

V  —.5431  / 

(l 

0  0 

0 

\ 

and 


T 


H2  =  I  —  2  u2  u2 


0 
0 
\0 


1 

0 

0 


The  resulting  matrix 


t  =  a3  =  h2a2h2 


\ 


4.0000 

2.4495 

0 

0 


0 

.4100 

.9121 

-2.4495 

2.3333 

.9428 

0 


0 

.9121 
.4100/ 

0 

.9428 

4.6667 

0 


°\ 
0 

0 

5/ 


is  now  in  tridiagonal  form. 


Since  the  final  tridiagonal  matrix  T  has  the  same  eigenvalues  as  A,  we  can  apply  the  QR 
algorithm  to  T  to  approximate  the  common  eigenvalues.  According  to  Exercise  9.5.17,  if 
A  =  A1  is  tridiagonal,  so  are  all  its  QR  iterates  A2,  A3, ....  Moreover,  far  fewer  arithmetic 
operations  are  required;  in  Exercise  9.5.25,  you  are  asked  to  quantify  this.  For  instance, 
in  the  preceding  example,  after  we  apply  20  iterations  of  the  Q  R  algorithm  directly  to  T, 
the  upper  triangular  factor  has  become 


-^20  — 


6.0000 
0 
0 
0 


-.0065 

4.5616 

0 

0 


0 

0 

5.0000 

0 


0  \ 
0 

0 

.4384/ 


The  eigenvalues  of  T,  and  hence  also  of  A,  appear  along  the  diagonal,  and  are  correct 
to  4  decimal  places.  As  noted  earlier,  with  the  eigenvalues  in  hand  the  corresponding 
eigenvectors  can  then  be  found  via  the  Shifted  Inverse  Power  Method  of  Exercise  9.5.7. 
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Finally,  even  if  A  is  not  symmetric,  one  can  still  apply  the  same  sequence  of  Householder 
reflections  to  simplify  it.  The  final  result  is  no  longer  tridiagonal,  but  rather  a  similar  upper 
Hessenberg  matrix ,  which  means  that  all  entries  below  its  subdiagonal  are  zero,  but  those 
above  its  superdiagonal  are  not  necessarily  zero.  For  instance,  a  5  x  5  upper  Hessenberg 
matrix  looks  like 


/  *  *  * 

*  *  * 

0  *  * 

0  0* 

\0  0  0 


* 

* 

* 

* 

* 


5 


where  the  starred  entries  can  be  anything.  It  can  be  proved  that  the  QR  algorithm 
maintains  the  upper  Hessenberg  form,  and,  while  not  as  efficient  as  in  the  tridiagonal 
case,  still  yields  a  significant  savings  in  computational  effort  required  to  find  the  common 
eigenvalues. 


If  A  has  no  eigenvalues  of  the  same  magnitude,  which,  in  particular,  requires  all  its 
eigenvalues  to  be  simple,  then  application  of  the  tridiagonal  QR  algorithm  to  its  tridiago- 
nalization  will,  usually,  produce  its  eigenvalues.  More  generally,  if  A  has  k  eigenvalues  of 
the  same  magnitude,  then  the  QR  algorithm,  applied  either  directly  to  A ,  or  to  its  tridi- 
agonalization,  will,  again  generically,  converge  to  a  block  upper  triangular  matrix,  with  an 
kx  k  matrix  in  the  block  diagonal  slot  that  has  these  same  eigenvalues.  Thus,  for  example, 
if  A  is  a  real  matrix  with  simple  real  and  complex  eigenvalues,  then  each  complex  conjugate 
pair  will  be  the  eigenvalues  of  one  of  the  2x2  matrices  appearing  on  the  diagonal  of  the 
eventual  QR  iterates,  while  the  real  eigenvalues  will  appear  directly  (in  a  1  x  1  “block”) 
on  the  diagonal. 


Further  details  and  results  can  be  found  in 


21,66,69,89,90]. 


Exercises 

9.5.18.  Use  Householder  matrices  to  convert  the  following  matrices  into  tridiagonal  form: 

/  4  0-1  1\ 

0  10-1 
-10  2  0 

\  1  — 1  0  3/ 


4b  9.5.19.  Find  the  eigenvalues,  to  2  decimal  places,  of  the  matrices  in  Exercise  9.5.18  by  applying 
the  QR  algorithm  to  the  tridiagonal  form. 


4b  9.5.20.  Use  the  tridiagonal  QR  Method  to  find  the  singular  values  of  A  = 


(2  21  -1\ 
1-20  1  . 
VO  -1  2  2) 

9.5.21.  Use  Householder  matrices  to  convert  the  following  matrices  into  upper  Hessenberg  form: 


(2  -1  9\ 

( 3  2  -1  1\ 

h-1 

o 

h-1 

h-1 

(a) 

1  3-4 

,  (b) 

2  4  0  1 

0  1  2-6 

.  (c) 

2  11-1 
-10  13 

V2  -1  -1/ 

Vl  0  -5  1  / 

V  3  -1  1  4/ 

4b  9.5.22.  Find  the  eigenvalues,  to  2  decimal  places,  of  the  matrices  in  Exercise  9.5.21  by  applying 
the  QR  algorithm  to  the  upper  Hessenberg  form. 


9.5.23.  Prove  that  the  effect  of  the  first  Householder  reflection  is  as  given  in  (9.90). 

9.5.24.  What  is  the  effect  of  tridiagonalization  on  the  eigenvectors  of  the  matrix? 
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0  9.5.25.  (a)  How  many  arithmetic  operations  —  multiplications/divisions  and  additions/ 

subtractions  —  are  required  to  place  a  generic  n  x  n  symmetric  matrix  into  tridiagonal 
form?  (b)  How  many  operations  are  needed  to  perform  one  iteration  of  the  QR  algorithm 
on  an  n  x  n  tridiagonal  matrix?  (c)  How  much  faster,  on  average,  is  the  tridiagonal 
algorithm  than  the  direct  QR  algorithm  for  finding  the  eigenvalues  of  a  symmetric  matrix? 

9.5.26.  Write  out  a  pseudocode  program  to  tridiagonalize  a  matrix.  The  input  should  be  an 
n  x  n  matrix  A,  and  the  output  should  be  the  Householder  unit  vectors  u1? . . . ,  un  l  and 
the  tridiagonal  matrix  R.  Does  your  program  produce  the  upper  Hessenberg  form  when  the 
input  matrix  is  not  symmetric? 

0  9.5.27.  Prove  that  in  the  H  =  LU  factorization  of  a  regular  upper  Hessenberg  matrix,  the 
lower  triangular  factor  L  is  bidiagonal,  as  in  (1.67). 


9.6  Krylov  Subspace  Methods 

So  far,  we  have  established  two  broad  classes  of  algorithms  for  solving  linear  systems. 
The  first,  known  as  direct  methods ,  are  based  on  some  version  of  Gaussian  Elimination  or 
matrix  factorization.  Direct  methods  eventually^  obtain  the  exact  solution,  but  must  be 
carried  through  to  completion  before  any  useful  information  is  obtained.  The  second  class 
contains  the  iterative  methods  discussed  above  that  lead  to  closer  and  closer  approximations 
to  the  solution,  but  almost  never  reach  the  exact  value.  One  might  ask  whether  there 
are  algorithms  that  combine  the  best  of  both:  semi-direct  methods  whose  intermediate 
computations  lead  to  closer  and  closer  approximations,  and,  moreover,  are  guaranteed  to 
terminate  in  a  finite  number  of  steps  with  the  exact  solution  in  hand. 

In  recent  years,  for  dealing  with  large  sparse  linear  systems,  such  as  those  arising  from 
the  numerical  solution  of  partial  differential  equations,  semi-direct  iterative  methods  based 
on  Krylov  subspaces  have  become  quite  popular.  The  original  ideas  were  introduced  in  the 
1930’s  by  the  Russian  naval  engineer  Alexei  Krylov,  who  was  in  search  of  an  efficient  and 
reliable  method  for  numerically  computing  eigenvalues.  Krylov  methods  have  seen  much 
development  in  a  variety  of  directions,  [32,  TO,  85],  and  we  will  show  how  they  can  be  used 
to  iteratively  solve  linear  systems  and  to  compute  eigenvalues. 

Krylov  Subspaces 

The  starting  point  is  an  n  x  n  matrix  A ,  assumed  to  be  real,  although  extensions  to  complex 
matrices  are  relatively  straightforward.  In  applications,  A  is  both  large  and  sparse,  meaning 
that  most  of  its  entries  are  0,  and  so  multiplying  A  by  a  vector  v  E  Mn  to  produce  the 
vector  Aw  is  an  efficient  operation. 

Recall  that  the  Power  Method  for  computing  the  dominant  eigenvalue  and  eigenvector 
of  A  is  based  on  successive  iterates  applied  to  a  randomly  chosen  initial  vector:  v,  A  v,  A2v, 
A3v, ....  We  will  employ  these  particular  vectors  to  span  a  collection  of  subspaces. 

Definition  9.49.  Given  an  n  x  n  real  matrix  A,  the  Krylov  subspace  of  order  k  >  1 
generated  by  a  nonzero  vector  0  ^  v  E  Mn  is  the  subspace  V ^  C  Mn  spanned  by  the 
vectors  v,  A  v,  A2v, . . . ,  Ak~1w.  We  also  set  =  {0}  by  convention. 


t  This  assumes  that  we  are  dealing  with  a  fully  accurate  implementation,  i.e.,  without  round-off 
or  other  numerical  error.  In  this  discussion,  numerical  instability  will  be  left  aside  as  a  separate, 
albeit  ultimately  important,  concern. 
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For  example,  if  v  is  an  eigenvector  of  i,  so  dv  =  Av,  then  V ^  is  the  one¬ 

dimensional  eigenspace  spanned  by  v;  conversely,  if  V ^  is  one-dimensional,  then  v  is 
necessarily  an  eigenvector,  and  hence  V ^  for  all  fc  >  1.  More  generally,  if 

y(j+i)  _  y(j)  £or  gome  j  >  0,  then  V ^  for  all  k  >  j.  This  is  easily  proved 

by  induction:  by  assumption,  AJv  E  V^\  and  thus  can  be  written  as  a  linear  combination 

dPv  —  cyv  +  c2Av  +•••-(-  cj-_1Mj_2v  +  c-A^~xw  E 

for  some  scalars  c1? . . . ,  c  ■.  Thus, 

T+1v  =  qdv  +  c242v  +  •••  +  +  CjA^  v 

=  c-cyr  +  (cx  +  CjC2)Av  +  •  •  •  +  (cJ-_2  +  cJ-cJ-_1)MJ_2v  +  (cj  +  c2)AJ_1v  E  V ^ 

also,  proving  that  =  V^\  The  general  induction  step  is  clear. 

Since  we  assumed  v^O,  as  otherwise  all  V ^  =  {0}  are  trivial  and  not  of  interest,  this 
argument  implies  the  existence  of  an  integer  m  E  N,  called  the  stabilization  order ,  such 
that  dim]/(fc)  =  k  for  k  —  1, . . .  ,  m,  while  V ^  =  V ^  has  dimension  m  for  all  k  >  m. 
Since  we  are  working  in  Mn,  clearly  m  <  n;  Exercise  9.6.3  gives  a  stricter  bound  for  m 
in  terms  of  the  degree  of  the  minimal  polynomial  of  the  matrix  A,  as  defined  in  Exercise 
8.6.23.  We  also  note  the  following  useful  result. 

Lemma  9.50.  Suppose  V ^  y(k~P .  Let  w  E  V ^  \  Then  dw  E  and, 

moreover,  is  spanned  by  dw  and  (a  basis  of)  V^k\  Moreover,  if  dw  E  V^k\  then 

y(fc+t)  _  y(k)  and  the  Krylov  subspaces  stabilize  at  order  k. 


Proof :  By  assumption, 

w  =  c1v  +  c2dv+  •••  +cfc_1Mfc_2v  +  cfcMfc_1v 

for  some  scalars  cl5 . . . ,  ck  with  ck  0.  Thus,  as  above, 

dw  =  c1dv  +  c2d2v+  •••  +  ck_1Ak~1v  +  ckAk-v  E  V^k+1\  (9.91) 

If  dw  E  V^k\  the  left-hand  side  of  (9.91)  is  a  linear  combination  of  v,  dv,  A2v,  . . .  ,  Afc_1v, 
and  hence,  since  ck  0,  so  is  Afcv,  which  implies  y(k+P  =  y(k\  Otherwise,  (9.91) 
implies  that  Akv  is  a  linear  combination  of  dw  and  dv,  A2v,  . . .  ,  Afc_1v,  and  thus  every 
vector  in  y(k+P  can  be  written  as  a  linear  combination  of  dw  and  the  Krylov  vectors 
v,  dv,  d2v,  ...  ,  dfe_1  v  E  y(k) .  Q.E.D. 


For  simplicity  in  what  follows,  we  will  assume  that  A  has  all  real  eigenvalues;  for  ex¬ 
ample  A  might  be  a  symmetric  matrix.  We  further  assume  that  A  has  a  unique  dominant 
eigenvalue  A1?  so  that  A:  is  a  simple  eigenvalue,  and  |  A:  |  >  |  A  •  |  for  all  j  >  1.  In  this  case, 
as  we  know  from  our  earlier  analysis,  for  most  initial  choices  of  the  vector  v,  the  vectors 
used  to  define  the  Krylov  subspace  tend  to  scalar  multiples  of  a  dominant  eigenvector  v1? 
meaning  that  Akv  -E  \kv1  as  k  -E  oo.  Thus,  the  Krylov  vectors  in  and  of  themselves  con¬ 
tain  increasingly  little  information,  particularly  in  a  numerical  environment.  As  with  the 
Power  Method,  matrices  with  several  dominant  eigenvalues,  including  real  matrices  with 
complex  conjugate  eigenvalues  and  matrices  for  which  d=A1  are  both  eigenvalues,  require 
suitable  modifications  of  the  methods. 


Arnoldi  Iteration 

The  way  to  get  around  the  pure  power  behavior  was  already  introduced  in  the  design  of 
the  QR  algorithm:  instead  of  the  Krylov  vectors,  one  constructs  an  orthonormal  basis  of 
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the  Krylov  subspace  using  the  Gram-Schmidt  process.  (As  above,  we  work  with  the  dot 
product  v  •  w  =  vTw  and  corresponding  Euclidean  norm  throughout  this  presentation, 
leaving  the  investigation  of  other  inner  products  to  the  motivated  reader.)  To  this  end,  we 
may  as  well  start  with  a  unit  vector,  and  so  replace  the  initial  vector  v  by  the  unit  vector 


ux  =  v/||  v  ||,  so  ||  ux  ||  =  1,  which  spans  the  initial  Krylov  subspace  The  second  order 

subspace  V ^  will  be  spanned  by  the  vectors  u:  and  A  u1?  and  we  extract  an  orthonormal 
basis  by  projection.  First,  according  to  our  orthogonal  projection  formulas,  the  vector 


v2  =  A  ux  -  ftn  u1? 


where 


h 


li 


m 

=  ujiul5 


satisfies  the  desired  orthogonality  condition  ux  v2  =  0.  If  v2  =  0,  then  u:  is  an  eigenvector 
of  A,  and  the  process  terminates,  since  the  Krylov  subspaces  would  immediately  stabilize: 

y(k)  =  y(  1)  for  aU  k 

>  1.  Otherwise,  we  replace  v2  by  the  unit  vector 


u 


2 


where  h21 


and  deduce  that  u±  and  u2  form  an  orthonormal  basis  for  V^2\  Proceeding  in  this  manner, 
assuming  that  k  <  m,  the  stabilization  order,  at  the  kth  stage,  we  have  already  computed 
orthonormal  vectors  u1? . . . ,  uk  such  that  u1? . . . ,  u  ■  form  an  orthonormal  basis  of  for 

each  j  =  1, . . . ,  k.  Taking  w  =  uk  in  Lemma  9.50,  we  deduce  that  u1? . . . ,  uk  and  A uk 
span  K^+1).  Our  orthogonal  projection  formula  (4.41)  implies  that 


k 

Vfc+1  =  Auk  -  hjkUj,  where  hjk  =  uj  Auk  (9.92) 

3  =  1 


lies  in  K^+1)  and  is  orthogonal  to  u0, . . . ,  uk.  If  vfc+1  =  0,  then  Auk  G  V^k\  and,  again 
by  Lemma  9.50,  the  Krylov  spaces  have  stabilized  with  Vdfc+1)  =  V^k\  Otherwise,  let 


ufc+i  — 


fc+i 


h  ’ 

nk+l,k 


where 


h 


fc+i,fc 


fc+i 


(9.93) 


be  the  corresponding  unit  vector,  so  that  u0, . . . ,  ufc+1  form  an  orthonormal  basis  of  V^k+1\ 
as  desired. 

While  the  preceding  algorithm  will  work  in  favorable  situations,  the  preferred  method, 
known  as  Arnoldi  iteration ,  named  after  the  mid-twentieth-century  American  engineer 
Walter  Arnoldi,  employs  the  stabilized  Gram-Schmidt  process  described  in  Section  4.2, 
thereby  ameliorating,  as  much  as  possible,  potential  numerical  instabilities.  Thus,  at  step 
k>  1,  having  u1? . . . ,  uk  in  hand,  one  iteratively  computes 

v2i  =Ank,  =  vjgj  -hjkUj,  where  hjk  =  ujvjr^,  for  j  =  1, . . . ,  k  -  1. 

(9.94) 

We  then  set  vfc+1  =  and,  if  it  is  nonzero,  use  (9.93)  to  define  the  next  orthonormal 
basis  vector  ufc+1.  In  Exercise  9.6.6  you  are  asked  to  prove  that  the  resulting  Arnoldi 
vectors  uk  and  coefficients  h-k  are  the  same  as  in  (9.92,  93)  (if  computed  exactly). 

It  is  instructive  to  formulate  the  Arnoldi  orthonormalization  process  in  matrix  form. 
First  note  that  we  can  rewrite  (9.92-93)  as 

k+ 1 

Auk  =  E  h3kUj> 

3  =  0 


(9.95) 
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and  hence,  by  orthonormality 

h  f  uJAuk>  i <j<k  +  i, 

Jk  to,  j>k  +  2. 

Let  Qk  —  ( ux  u2  . . .  uk  )  denote  the  n  x  k  matrix  whose  columns  are  the  first  k  Arnoldi 
vectors.  Since  these  are  orthonormal,  it  follows  that 


(9.96) 


(9.97) 


(However,  keep  in  mind  that  Qk  is  a  rectangular  matrix,  and  so  QkQk  is  in  general  not 
the  identity  matrix.)  Let 


(  h-u 

k'i2 

^13 

h-i4 

. . . 

^1,/c  — 2 

^1,/c  — 1 

hik 

h21 

h-22 

^23 

h24 

. . . 

^2,k-2 

^2,fc-l 

^2  k 

0 

h'32 

^33 

h34 

. . . 

^3,k-2 

^3,/c  — 1 

^3  k 

0 

0 

CO 

h44 

•  > 

CN 

^  ... 

T— 1 

^  ... 

^4  k 

0 

0 

0 

0 

™fc-l,fc-2 

hk-i,k-i 

^k- 1 

0 

0 

0 

. . . 

0 

0 

hk,k- 1 

hkk 

\ 


7 


(9.98) 


be  the  k  x  k  upper  Hessenberg  matrix  formed  by  the  coefficients  hjk 
implies  that 


QlAQ 


k • 


given  in  (9.96),  which 

(9.99) 


In  particular,  if  A  is  symmetric,  then  so  is  Hk ,  which  implies  that  it  is  also  tridiagonal.  In 
this  case,  the  Arnoldi  algorithm  is  known  as  the  symmetric  Lanczos  algorithm ,  after  the 
Hungarian  mathematician  Cornelius  Lanczos. 

Equation  (9.99)  yields  an  alternative  interpretation  of  the  Arnoldi  iteration  as  a  (partial) 
orthogonal  reduction  of  A  to  Hessenberg  or,  in  the  symmetric  case,  tridiagonal  form.  The 
matrix  Hk  can  be  viewed  as  the  representation  of  the  orthogonal  projection  of  A  onto  the 
Krylov  subspace  V ^  in  terms  of  the  basis  formed  by  the  Arnoldi  vectors  u1? . . . ,  uk.  Thus, 
we  can  identify  Hk  with  the  (projected)  action  of  A  on  the  subspace  V ^  and,  as  such,  its 
dominant  eigenvalues  and  eigenvectors,  which  can  be  computed  using  the  QR  algorithm, 
are  expected  to  form  good  approximations  to  those  of  A  itself.  Since  its  predecessor,  Hk_1: 
coincides  with  the  upper  left  (k  —  1)  x  (k  —  1)  submatrix  of  iLfc,  the  QR  factorizations  of  the 
Hessenberg  coefficient  matrices  Hk  can  be  speeded  up  by  an  iterative  procedure;  see  [70] 
for  details.  One  can  also  use  Householder  reflections  to  tridiagonalize  Hk  before  applying 
QR.  Of  course,  if  A  is  symmetric,  then,  as  noted  above,  Hk  is  already  tridiagonal  and 
so  this  step  is  superfluous.  Moreover,  if  the  method  is  carried  out  to  the  stabilization 
order  m,  the  resulting  Krylov  subspace  is  invariant  under  A,  and  hence  the  eigenvalues  of 
coincide  with  those  of  A  restricted  to  cf.  Exercise  8.4.5.  In  this  manner,  the 

Arnoldi/Lanczos  algorithm  produces  a  semi-direct  method  for  approximating  eigenvalues 
of  the  matrix  A.  Again,  the  Shifted  Inverse  Power  Method  of  Exercise  9.5.7  can  then  be 
used  to  compute  each  corresponding  eigenvector. 

We  further  note,  as  a  consequence  of  the  first  equation  in  (9.95),  the  following  formula 
relating  the  Arnoldi  matrix  Qk  to  its  successor  Qfc+1: 


H 


k-> 


(9.100) 
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where 


H 


k  ~ 


/  41 

^  1 2 

^13 

^  14 

. . . 

^1,/c  — 1 

^lk 

4 1 

^■22 

^23 

h24 

. . . 

^2,k-2 

^2,/c  — 1 

^2  k 

0 

^■32 

^33 

CO 

. . . 

^3,k-2 

^3,/c  — 1 

CO 

0 

0 

h43 

h44 

•  . 

1 

Jd  ... 

^4,/c  — 1 

^4  k 

0 

0 

0 

0 

"'k- l,fc— 2 

hk-l,k-l 

hk-l,k 

0 

0 

0 

. . . 

0 

0 

hk,k- 1 

hkk 

0 

0 

0 

. . . 

0 

0 

0 

fyfe+1,1 

\ 


J 


(9.101) 


is  the  {k  +  1)  x  k  matrix  formed  by  appending  the  indicated  bottom  row  to  Hk. 
Finally,  we  note  the  useful  formula 


Qlv  = 


4  ’ 


(9.102) 


T  b 

with  e-L  =  ( 1,  0,  0, . . . ,  0  )  G  MA:  the  hrst  standard  basis  vector.  This  is  a  consequence  of 
the  orthonormality  of  the  Arnoldi  vectors  u1? . . . ,  ufc,  which  form  the  rows  of  Qk,  along 


with  the  fact  that  v  = 


u 


Remark.  In  numerical  applications,  the  best  results  are  obtained  by  maximizing  the 
stabilization  order  of  the  Krylov  subspaces  generated  by  the  initial  vector,  and  so  a  random 
choice  of  the  initial  vector  v,  or,  equivalently,  the  initial  unit  vector  rq  is  preferred  so  as  to 
minimize  chances  of  low  order  degeneration  and  consequent  inaccuracies.  In  the  unlucky 
event  that  stabilization  occurs  prematurely,  one  should  restart  the  method  with  a  different 
choice  of  initial  vector,  [TO]. 


The  Full  Orthogonalization  Method 

Krylov  subspaces  can  also  be  applied  to  generate  powerful  semi-direct  iterative  algorithms 
for  solving  linear  systems.  There  are  two  different  approaches.  The  hrst  starts  with  the 
concept  of  a  weak  or  Galerkin  formulation  of  a  linear  system,  which  is  the  elementary 
observation  that  that  the  only  vector  that  is  orthogonal  to  every  vector  in  an  inner  product 
space  is  the  zero  vector;  see  Exercise  3.1.10(a).  As  above,  we  concentrate  on  the  case 
V  =  Mn  with  the  standard  dot  product.  The  observation  means  that  x  G  Mn  solves  the 
linear  system  A  x  =  b  if  and  only  if 

vt(j4x  —  b)  =  0  for  all  vel"  (9.103) 

Solution  techniques  based  on  this  formulation  were  hrst  studied  in  depth,  in  the  context 
of  the  mechanics  of  thin  elastic  plates,  by  the  Russian  engineer  Boris  Galerkin  in  the  hrst 
half  of  the  twentieth  century,  and  often  bear  his  name. 

In  the  case  of  linear  algebraic  systems,  the  Galerkin  formulation  per  se  does  not  add 
anything  to  what  we  already  know.  However,  it  becomes  important  for  the  numerical 
approximation  of  solutions  by  restricting  (9.103)  to  a  smaller-dimensional  subspace  V  C  Mn. 
Specifically,  one  seeks  a  vector  xGf  such  that  the  Galerkin  formulation  (9.103)  holds  for 
all  v  G  V.  In  other  words,  the  approximate  solution  is  the  vector  x  G  V  such  that  the 
residual  r  =  b  —  Ax  is  orthogonal  to  the  subspace  V.  With  a  suitably  inspired  choice  of 
the  subspace  V,  the  Galerkin  formulation  may  well  provide  a  decent  approximation  to  the 
actual  solution. 
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Remark.  One  can  easily  adapt  the  Galerkin  formulation  to  general  linear  systems  L[u  — 
/,  where  L:U  — >  V  is  any  linear  operator  between  vector  spaces.  The  corresponding  weak 
formulation,  as  described  in  Exercise  7.5.9,  has  become  an  extremely  important  tool  in 
the  modern  mathematical  analysis  of  differential  equations,  which  take  place  in  infinite¬ 
dimensional  function  spaces.  Moreover,  the  restriction  of  the  weak  formulation  to  a  finite¬ 
dimensional  subspace  V  C  U  is  the  basis  of  the  powerful  finite  element  solution  method 
for  boundary  value  problems;  see  [8,  61]  for  details. 


Remark.  The  question  of  existence  and  uniqueness  of  the  Galerkin  approximate  solution 
depends  upon  the  matrix  A  and  the  choice  of  subspace  V.  Given  a  basis  v1? . . . ,  vfc  of 
V,  we  express  x  =  y1v1  +  •  •  •  +  y^vk  —  Sy,  where  S  —  ( vx  v2  . . .  vfc )  is  the  n  x  k 

matrix  whose  columns  are  the  basis  vectors,  while  y  =  ( y1:  y2-)  •  •  • ,  24  )  G  contains  the 

coordinates  of  x  =  Sy  G  V  with  respect  to  the  given  basis.  Then  the  Galerkin  conditions 
on  V  can  be  written  as 


vT(dx  —  b)  =  wT  (AS  y  —  b)  =  0  for  all  vGb 
Expressing  v  =  Sz  for  z  G  in  the  same  fashion,  this  becomes 

zT ST (ASy  —  b)  =  z T (STASy  —  ST b)  =  0  for  all  z  G 


which  clearly  holds  if  and  only  if 

STASy  =  STb.  (9.104) 

This  is  a  linear  system  of  k  equations  in  the  k  unknowns  y  G  Rfc.  Thus,  a  solution  exists 
and  is  uniquely  determined  if  and  only  if  the  k  x  k  coefficient  matrix  STAS  is  nonsingular, 
which  requires,  at  the  very  least,  rank  A  >  fc,  and  places  additional  constraints  on  S. 

As  you  may  suspect,  in  the  case  of  a  linear  algebraic  system,  a  particularly  good  choice  of 
subspace  for  a  Galerkin  approximation  to  the  solution  is  a  Krylov  subspace.  The  resulting 
solution  method  is  known  as  the  Full  Orthogonalization  Method ,  abbreviated  FOM,  [70]. 
In  detail,  the  method  proceeds  as  follows.  Let  V ^  C  Mn  be  the  order  k  Krylov  subspace 
generated  by  the  right-hand  side  b,  and  thus  spanned  by  b,  Ab,  A2b, . . . ,  Ak~1b.  The  kth 
Krylov  approximation  to  the  solution  x  is  the  vector  xfc  G  V ^  whose  residual  rk  =  b— 4xfc 
satisfies  the  Galerkin  condition  of  being  orthogonal  to  the  subspace: 

v  •  rfc  =  vT(b  —  Axfc)  =  0  for  all  v  G  V^k\ 

In  particular,  the  initial  approximation  is  taken  to  be  x0  =  0  G  V^\  with  residual  vector 
r0  =  b  —  Ax0  =  b.  Moreover,  Lemma  9.50  implies  rk  =  b  —  4xfc  G  Since  it  is 

orthogonal  to  V^k\  it  must  be  a  scalar  multiple  of  the  (k  +  l)st  Arnoldi  vector: 


rk  ~  Cfc+lUfc+l’ 


where 


cfc+ 1  — 


k 


This  implies  that  the  residual  vectors  are  also  mutually  orthogonal: 


rj  •  rfe  =  0, 


3  +  k- 


(9.105) 

(9.106) 


Using  the  orthonormal  Arnoldi  basis  vectors  u1; . . . ,  uk  <E  V <k> ,  which  form  the  columns  of 
the  matrix  Qk ,  we  write  xfc  =  Qk yfc,  and  hence,  recalling  (9.99),  equation  (9.104)  becomes 


QkAQkYk  =  HkYk  =  Qlh  =  llbll  ei> 


(9.107) 
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where  Hk  is  the  upper  Hessenberg  matrix  (9.99),  and  we  use  (9.102)  (with  b  replacing  v, 
as  per  our  initial  supposition)  to  obtain  the  final  expression.  Solving  the  resulting  system 
(9.107),  assuming  Hk  is  invertible,  for  yk  =  ||b||  Hk1e1  produces  the  kth  order  Krylov 
approximation  to  the  solution 


xfc  =  Q\Jk 


b||  QkH^ev 


(9.108) 


Of  course,  in  applications  one  does  not  explicitly  compute  the  inverse  Hkl  but  rather 
uses,  say,  its  LU  factorization  Hk  =  LkUk  (assuming  regularity),  coupled  with  forward 
and  back  substitution  to  solve  (9.107).  Moreover,  according  to  Exercise  9.5.27,  the  lower 
unitriangular  factor  Lk  is  bidiagonal,  meaning  that  all  entries  not  on  the  diagonal  or 
subdiagonal  are  zero.  Of  course,  because  the  upper  left  (k  —  1)  x  (k  —  1)  entries  of  Hk 
are  the  same  as  those  of  its  predecessor,  whose  factorization  Hk_1  =  Lk_1Uk_1  can  be 

assumed  to  already  be  known,  we  can  quickly  factorize  Hk.  Namely,  we  write 


(Hk- 1  f*  t 

V  g Tk  hkk  )  ’ 


(Lk- 1 
V  mfc 


o 


•> 


where  ffc,gfc,mfc,zfc  G  Rk  1,  while  hkk,ukk  G  R.  Moreover,  since  Hk  is  upper  Hessenberg, 
both  gfc  =  hk  k_1ek_1  and  mk  =  lk  k_1ek_1  are  multiples  of  the  (k— l)st  basis  vector  efc_1  G 

Rk-\  Multiplying  out  Hk  —  LkUk  implies  that  we  need  only  solve  a  single  triangular  linear 
system,  via  forward  substitution,  along  with  a  pair  of  scalar  linear  equations,  resulting  in 


Lk-izk  ~  ffej 


'k,k- 


1  h'kM  —  1  /^k  —  l.k  —  1  ’ 


Ukk  ~  ^kk 


h,k-lUk-l,k 


(9.109) 


Remark.  Suppose  you  happen  to  know  a  good  initial  guess  x0  for  the  solution.  The 
convergence  can  then  be  speeded  up  by  setting  x  =  x  —  x0,  which  converts  the  original 
system  to  dx  =  b,  where  b  =  r0  =  b  —  4x0  is  the  initial  residual.  On  applying  the 
FOM  algorithm  to  the  modified  system,  the  resulting  xfc  G  V ^  in  the  Krylov  subspaces 

generated  by  b  provide  the  improved  approximations  xfc  =  xfc  +  x0  to  the  solution  x  to 
the  original  system. 


The  Conjugate  Gradient  Method 


The  most  important  case  of  the  FOM  algorithm  is  that  in  which  the  coefficient  matrix  A  is 
symmetric,  and  hence,  as  noted  above,  Hk  is  symmetric,  tridiagonal,  which  means  that  the 
system  (9.107)  can  be  quickly  solved  by  the  tridiagonal  version  of  Gaussian  Elimination, 
cf.  (1.69-70).  In  particular,  if  A  >  0  is  positive  definite,  then  so  is  Hk  >  0,  and  the 
resulting  algorithm  is  known  as  the  Conjugate  Gradient  Method ,  often  abbreviated  CG, 
first  introduced  in  1952  by  Hestenes  and  Stiefel,  [39].  It  is  now  the  most  widely  used 
method  for  solving  linear  systems  with  positive  definite  coefficient  matrices,  e.g.,  those 
arising  in  the  numerical  solution  to  boundary  value  problems  for  elliptic  systems  of  partial 
differential  equations,  [8,  61  . 

There  is  a  simpler  direct  way  to  formulate  the  CG  algorithm,  which  is  the  one  that  is 
used  in  practice.  First,  we  apply  Theorems  1.29  and  1.34  to  refine  the  factorization  of  the 
tridiagonal  matrix: 


h^k  hjkDkLk , 


(9.110) 

where  Lk  is  lower  unitriangular  and  Dk  is  diagonal.  Let  Ck  be  the  k  x  k  diagonal  matrix 
with  diagonal  entries  c  •  =  ||  r7-_1 1|  for  j  =  1  so  that,  according  to  (9.105), 


Qkck  =  Rk  =  (ro  ri  •••  rfc-i) 
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is  the  matrix  of  residual  vectors.  Define 

wk  =  (  Wi  w2  ...  Wfe)  =  QkL~TCk  -  RkVk,  (9.111) 

where  the  columns  w1? . . . ,  wk  of  Wk  are  known  as  the  conjugate  directions ,  and  where 


/ 1  *1 

\ 

—  T  ~T C  — 

—  Uk  ^ k  Uk  ~ 

1  s2 

• 

1  sk- 1 

1  / 

is  upper  unitriangular.  Note  that,  for  j  >  i,  the  (j  +  l)st  column  of  the  matrix  equation 
Rk  =  implies 


r3  ~  w ;  t  i  -  sjwj- 


(9.112) 

We  claim  that  the  vectors  w1,...,wfc  are  conjugate ,  which  means  that  they  mutually 
orthogonal  with  respect  to  the  inner  product ^  (( v  ,  w))  =  vT4w  induced  by  A,  and  so 


(( ,  wj 


*  ¥=  j- 


(9.113) 


To  verify  (9.113),  we  use  (9.110,  111)  to  compute  the  corresponding  Gram  matrix,  whose 
entries  are  the  inner  products: 


—  1 


Wl  AWk  =  CkLkLQ 


kAQk 


L 


k 


TCk  =  CkL^HkL^TCk  =  CkDkCk  =  CkDk, 


the  final  result  being  a  diagonal  matrix.  We  deduce  that  all  the  off-diagonal  entries  of  the 
Gram  matrix  Wk  AW k  vanish,  which  proves  (9.113). 

Let  us  write  the  kth  approximate  solution  xfc  E  V ^  in  the  form 

xfc  =  QkJk  =  wUk  =  ^iwi  +  •  •  •  +  hwk,  where  tk  =  CklLTkJk- 

As  a  consequence  of  (9.112)  with^  k  replacing  j .  along  with  (9.113),  its  residual  vector 
rk  =  b  —  A  xfc  satisfies 


(( rk  -  wfe  )>  =  (( wfe+i  -  .  wfc  ))  =  -  sk((  wfe  .  wfe  »> 

(( rfe  » Wfe+ 1 »  =  (( wfe+i  -  SfeWfe  » Wfe+1  )>  =  (( Wfc+1  -  wfe+i  )>• 


(9.114) 


The  (k  +  l)st  approximation  can  be  written  in  the  iterative  form 


xfc+i  =  xfc  +  4+iwfe+i-  (9.1.15) 

meaning  that  we  move  from  xfc  to  xfc+1  by  adding  a  suitable  scalar  mutiple  of  the  conjugate 
direction  wfc+1.  The  updated  residual  is 

U+i  =  b-^xfc+i  =  h~A*k  ~tk+iA™k+i  =  rk~tk+iAwk+ 1-  (9.U6) 


’  Of  course,  (9.113)  defines  a  genuine  inner  product  only  if  A  >  0.  On  the  other  hand,  the  ensuing 
calculations  only  require  symmetry  of  the  coefficient  matrix,  although  there  is  no  guarantee  that 
the  resulting  linear  systems  can  be  solved  when  A  is  not  positive  definite. 

^  To  be  completely  accurate,  the  resulting  equation  appears  as  the  (k  +  l)st  column  of  the 
subsequent  matrix  equations  Ri  =  WjVj~l  for  all  /  >  k  +  1. 
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Conjugate  Gradient  Method  for  Solving  A  x  =  b  with  A  >  0 


start 

choose  an  initial  guess  x0 ,  e.g.,  x0  =  0 
for  k  —  0  to  rn  —  1 
set  rfe  =  b-  4xfe 

if  rk  =  0  print  “xfc  is  the  exact  solution”  ;  end 


if  k  =  0  set  w1  —  r0  else  set  wfc+1  =  rk  + 

2 


fc 


set  x 


fc+i 


=  xfe  + 


fc 


k-1 


2  Wfc 


Wk+lAwk  +  l 


w 


fc+1 


next  k 


end 


Orthogonality  of  the  residuals,  (9.106),  coupled  with  (9.114)  implies 


0  =  rferfc+ 1  = 


k 


h+irkAwk+i  - 


k 


k 


4+l\\rfc  ?  Wfc+1 


^fc+l\\  Wfc+1  ’  Wfc+ 1  //? 


hence 


^fc+i 


rferfe 


wfc+i  ’  wfc+ 1 


))  w[+1iw 


fc+1 


Finally,  using  (9.106)  and  (9.116, 117),  with  /c  replaced  by  k  —  1,  yields 


(9.117) 


k 


=  rfe  (rfc-i  -  4^wfe)  =  -hU- Awfc  =  ~tk  {(rk  ,  wfc  ))  =  - 


T 


((  rfc  ,  wfc  )) 


fc-1 


Thus,  referring  back  to  (9.114), 


wfc.  wfe 


sfc  = 


rfc.wfe 


fc 


Wfc  ,  Wfe  )) 


fc-1 


(9.118) 


The  iterative  equations  (9.115, 117, 118)  constitute  the  Conjugate  Gradient  algorithm, 
which  is  summarized  in  the  accompanying  pseudocode.  The  algorithm  can  also  be  applied 
if  A  is  merely  symmetric,  although  it  may  break  down  if  the  denominator  w[+14wfc+1  =  0, 
which  will  not  occur  in  the  positive  definite  case  (why?).  At  each  stage,  xfc  is  the  current 
approximation  to  the  solution.  The  initial  guess  x0  can  be  chosen  by  the  user,  with  x0  =  0 
the  default.  The  number  of  iterations  m  <  n  can  be  specified  in  advance;  alternatively, 
one  can  impose  a  stopping  criterion  based  on  the  size  of  the  residual  vector,  ||rfc||,  or, 
alternatively,  the  amount  of  change  between  successive  iterates,  as  measured  by,  say,  their 
distance  ||xfc+1  —  xfc  ||  in  either  the  Euclidean  norm  or  the  oo  norm.  Because  the  process 
is  semi-direct,  eventually  rk  =  0  for  some  k  <  n,  and  so,  in  the  absence  of  round-off  errors, 
the  result  will  be  the  exact  solution  to  the  system.  Of  course,  in  examples,  one  would 
not  carry  through  the  algorithm  to  the  bitter  end,  since  a  decent  approximation  to  the 
solution  is  typically  obtained  with  relatively  few  iterations.  For  further  developments  and 
applications,  see  [21,  66,  70,  89]. 

Remark.  The  reason  for  the  name  “conjugate  gradient”  is  as  follows.  The  term  gradient 
stems  from  the  minimization  principle  characterizing  the  solutions  to  linear  systems  with 


9.6  Krylov  Subspace  Methods 


545 


positive  definite  coefficient  matrices.  According  to  Theorem  5.2,  if  A  >  0,  the  solution  to 
the  linear  system  A  x  =  b  is  the  unique  minimizer  of  the  quadratic  function 


p(x)  =  \  xT  A  x  —  xTb. 


(9.119) 


One  approach  to  solving  the  system  is  to  try  to  successively  minimize  p(x)  as  much  as 
possible.  Suppose  we  find  ourselves  at  a  point  x  that  is  not  the  minimizer.  In  which 
direction  should  we  travel?  Multivariable  calculus  tells  us  that  the  gradient  vector  Vp(x) 
of  a  function  points  in  the  direction  of  its  steepest  increase  at  the  point,  while  its  negative 
—  Vp(x)  points  in  the  direction  of  steepest  decrease,  [2,  78].  The  gradient  of  the  particular 
quadratic  function  (9.119)  is  easily  found: 


—  Vp(x)  =  b-  4x  =  r. 


Thus,  the  residual  vector  specifies  the  direction  of  steepest  decrease  in  the  quadratic  func¬ 
tion,  and  is  thus  a  good  choice  of  direction  in  which  to  head  off  in  search  of  the  true 
minimizer.  (If  one  views  the  graph  of  p  as  a  mountain  range,  then,  at  any  given  location 
x  with  elevation  p(x),  the  negative  gradient  —  Vp(x)  =  r  points  in  the  steepest  downhill 
direction.)  This  idea  leads  to  the  gradient  descent  algorithm ,  in  which  each  successive 
approximation  xk  to  the  solution  is  obtained  by  going  a  certain  distance  in  the  residual 
direction: 

xk+1=xk  +  dk  rfc,  where  rk  =  b  —  Axk.  (9.120) 

The  scalar  factor  dk  is  to  be  specified  so  that  the  resulting  p(xfc+1)  is  as  small  as  possible; 
in  Exercise  9.6.14  you  are  asked  to  find  this  value.  Gradient  descent  is  a  reasonable  algo¬ 
rithm,  and  will  lead  to  the  solution  in  favorable  situations.  It  is  also  effectively  used  to 
find  minima  of  more  general  nonlinear  functions.  However,  in  certain  circumstances,  the 
iterative  method  based  on  gradient  descent  can  take  a  long  time  to  converge  to  an  accurate 
approximation  to  the  solution,  and  so  is  typically  not  competitive.  To  obtain  the  speedier 
Conjugate  Gradient  algorithm,  we  modify  the  gradient  descent  idea  by  requiring  that  the 
next  descent  direction  be  chosen  so  that  it  is  conjugate  to  the  preceding  directions,  i.e., 
satisfies  (9.113).  This  idea  can  be  used  to  produce  an  independent  direct  derivation  of  the 
Conjugate  Gradient  algorithm. 


Example  9.51.  Consider  the  linear  system  Ax  =  b  with 


T 

The  exact  solution  is  x^  =  (2,5,— 6)  .  Let  us  implement  the  method  of  conjugate 
gradients,  starting  with  the  initial  guess  x0  =  (0,0,0)T.  The  corresponding  residual 
vector  is  merely  r0  =  b  —  4x0  =  b  =  (1,2,— 1)T.  The  first  conjugate  direction  is 

wx  =  r0  =  ( 1,  2,  —1 )  ,  and  we  use  formula  (9.115)  to  obtain  the  updated  approximation 
to  the  solution 

1  \  2\ 

6  '  1\  /  2  \ 


Xi  =  x0  + 


noting  that  (( w,  .  w  ))  =  w(F4  w,  =  4.  For  the  next  stage  of  the  algorithm,  we  compute 


o 


wi ,  Wj 


wi=  4 


v-t/ 
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T 

the  corresponding  residual  r1=b-4x1  =  (-  ^,-l,-|)  .  The  conjugate  direction  i 

f-h\  (  l\  / 

2 

1/ 


is 


W9  =  I\  + 


rl 

to 

ro 

to 

Wi  = 


1 


v  - 


+ 


5  / 

9  / 


15 

2 

6 


V- 


2  \ 

4 


\- 


3 
2 

15  , 

A  / 


which,  as  designed,  satisfies  the  conjugacy  condition  (( wy  ,  w2 
entry  of  the  ensuing  approximation 


wj  Aw  2 


=  0.  Each 


Xo  —  Xi  T 


w2 ,  w2 }) 


w2  - 


/  3  \ 

2  ' 

3 

V-i/ 


+ 


15 

2 

27 

4 


/  3  \ 

4 


3 

2 


i  \ 


\-9J 


3 

14 

3 


V-¥/ 


2.3333  \ 
4.6667 
V  —  5.6667  / 


is  now  within  a  ^  of  the  exact  solution  x*. 

Since  we  are  dealing  with  a  3  x  3  system,  we  will  recover  the  exact  solution  by  one  more 

T 

iteration  of  the  algorithm.  The  new  residual  isr2=b  —  4x2  =  (  -  |,|,0)  .  The  final 
conjugate  direction  is 


Wo  =  r9  + 


r2 

to 

rl 

to 

/- 1\ 


w2 


■ 


V  o/ 


20 

9 

15 

2 


/  3\ 

4 
3 
2 

v-¥/ 


/  _  io  \ 

9 

10 
9 

-10  / 
9  / 


V 


which,  as  you  can  check,  is  conjugate  to  both  w:  and  w2.  The  solution  is  obtained  from 


Xo  =  x9  + 


W3,W3 


Wo  = 


V- 


z\ 

3 

14 
3 

11  / 

9  / 


+ 


20 

9 

200 

27 


V  - 


10  \ 

9 

10 

9 

10  , 
9  7 


/  2  \ 

5 

V  —6  / 


The  Generalized  Minimal  Residual  Method 

A  natural  alternative  to  the  Galerkin  weak  approach  is  to  try  to  directly  minimize  the  norm 
of  the  residual  r  =  b  —  Ax  when  the  approximate  solution  x  is  required  to  he  in  a  specified 
subspace  xGh  When  V  is  a  Krylov  subspace,  this  idea  results  in  the  Generalized  Minimal 
Residual  Method  (usually  abbreviated  GMRES),  which  was  developed  by  the  Algerian  and 
American  mathematicians/computer  scientists^  Yousef  Saad  and  Martin  Schultz,  [71]. 

As  in  the  FOR  Method,  we  choose  the  Krylov  subspaces  generated  by  b,  the  right-hand 
side  of  the  system  to  be  solved,  but  now  seek  the  vector  x£  e  V(k)  that  minimizes  the 
Euclidean  norm  ||Ax  —  b||  over  all  vectors  x  E  V^k\  This  approach  corresponds  to  the 
initial  approximation  x0  =  0  e  y(1)  ;  as  before,  if  we  know  a  better  initial  guess  x0,  we  set 
x  =  x  —  x0,  which  converts  the  original  system  to  Ax  =  b,  where  b  =  r0  =  b  —  Ax0  is  the 
initial  residual,  and  then  apply  the  method  to  the  new  system. 

Again,  we  express  the  vectors 


xfc  =  2/iui  +  •  •  •  +  ykuk  =  Qky  e  v{k) 


^  Coincidentally,  the  first  author  of  the  book  you  are  reading  is  at  the  same  university, 
Minnesota,  as  Saad,  and  had  the  same  thesis  advisor,  Garrett  Birkhoff,  as  Schultz. 
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as  linear  combinations  of  the  orthonormal  Arnoldi  basis  vectors,  with  coefficients  yk  — 

T  b 

( y1: . . . ,  yk  )  E  R  .  In  view  of  (9.100)  and  (9.97),  with  k  replaced  by  k  +  1,  the  squared 
residual  norm  is  given  by 


2 

k 


A*k  -bf  =  MCfcYfc  -  bH2  =  IICfc+i^feYfe  -  b 


=  (Qk+iHkYk  -  h)T (Qk+iHkyk  - b)  =  y'kH'kHkyk  ~  2ykHkQk+ih  +  II b 


T  7tT 


T  ttT  aT 


=  ylHlHkyk  -  2yfe Hi  ck  + 


T  TtT 


'k 


Hkyk  -c 


k 


where,  according  to  (9.107)  again  with  k  replaced  by  k  +  1, 


t  b  - 

fc+iu  — 


ck  =  Q 

We  deduce  that  minimizing 


ex  E  R 


fc+i 


so  that 


'k 


(9.121) 


(9.122) 


if 


fey 


’k 


Ax  —  b  1 1  over  all  x  e  y(fe)  is  the  same  as  minimizing 
over  all  y  E  The  latter  is  a  standard  least  squares  minimization  problem, 


whose  solution  yk  is  found  by  solving  the  corresponding  normal  equations 


HlHkyk  =  Hlck  =  llbll  Hle  i  =  llb 


T 


(9.123) 


(  ^11  5  ^12’  •  •  •  ’  ^1  k  ) 

Solving  (9.123),  produces  the  desired  minimizer  xk  —  Qk yk  E  V^k\  and  hence  the  desired 
approximation  to  the  solution  to  the  original  linear  system. 

The  result  of  this  calculation  is  the  Generalized  Minimal  Residual  Method  (GMRES) 
algorithm.  To  successively  approximate  the  solution  to  dx  =  b,  on  the  /cth  iteration,  we 
set  c  =  1 1  b  1 1  e1 ,  and  then  perform  the  following  steps: 

(a)  calculate  uk  and  Hk  using  the  Arnoldi  Method; 

(b)  use  least  squares  to  find  the  vector  y  =  yk  that  minimizes  ||  Hk  y  —  c 

(c)  let  xk  =  Qkyk  be  the  kth  approximate  solution. 


k 


Ax 


k 


H 


k  y 


IS 


The  process  is  repeated  until  the  residual  norm 
below  a  pre-assigned  threshhold.  Again,  because  of  the  iterative  structure  of  the  Krylov 
vectors,  and  hence  the  upper  Hessenberg  matrices  iffc,  knowing  the  solution  to  the  order 
k  minimization  roblem  allows  one  to  rather  quickly  construct  that  of  the  order  k  +  1 
version.  As  with  all  Krylov  methods,  GMRES  is  a  semi-direct  method  and  hence,  if 
performed  in  exact  arithmetic,  will  eventually  produce  the  exact  solution  once  the  Krylov 
stabilization  order  is  reached.  As  with  FOM/CG,  this  is  rarely  required,  and  one  typically 
imposes  a  stopping  criterion  based  on  either  the  norm  of  the  residual  vector  or  the  size 


of  the  difference  between  successive  iterates 


^■fc+l  ^k 


The  method  works  very  well  in 


practice,  particularly  with  the  sparse  coefficient  matrices  arising  in  many  numerical  solution 
algorithms  for  partial  differential  equations  and  beyond,  including  finite  difference,  finite 
element,  collocation,  and  multipole  expansion. 


Exercises 


9.6.1.  Find  an  orthonormal  basis  for  the  Krylov  subspaces  l/("21.  I-' ! 2 1  for  the  following 

matrices  and  vectors: 


(a)  A 


( 2 

2 

-i\ 

( 

-1\ 

( b )  A  = 

2 

-1 

0 

,  V  = 

2 

^2 

1 

3  J 

K 

o  ) 

0 

°\ 

/n 

1 

0 

0 

2 

-1 

,  v  = 

0 

1 

2  / 

\0j 
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9.6.2.  Let  v  =  x  +  iy  be  an  eigenvector  corresponding  to  a  complex,  non-real  eigenvalue  of  the 
real  n  x  n  matrix  A.  (a)  Prove  that  the  Krylov  subspaces  V ^  for  k  >  2  generated  by  both 
x  and  y  are  all  two-dimensional,  (b)  Is  the  converse  valid?  Specifically,  if  dimU^  =  2, 

then  all  V(k)  are  two-dimensional  for  k  >  1  and  spanned  by  the  real  and  imaginary  parts  of 
a  complex  eigenvector  of  A. 

0  9.6.3.  (a)  Prove  that  the  dimension  of  a  Krylov  subspace  is  bounded  by  the  degree  of  the 

minimal  polynomial  of  the  matrix  A,  as  defined  in  Exercise  8.6.23.  (b)  Is  there  always  a 
Krylov  subspace  whose  dimension  equals  the  degree  of  the  minimal  polynomial? 

9.6.4.  True  or  false:  A  Krylov  subspace  is  an  invariant  subspace  for  the  matrix  A. 

9.6.5.  Prove  that  the  invertibility  of  the  coefficient  matrix  STAS  in  (9.104)  depends  only  on 
the  subspace  V  and  not  on  the  choice  of  basis  thereof. 

0  9.6.6.  Prove  that  (9.92,  93,  94)  give  the  same  Arnoldi  vectors  uk  and  the  same  coefficients  hjk 
when  computed  exactly. 


9.6.7.  Solve  the  following  linear  systems  by  the  Conjugate  Gradient  Method,  keeping  track  of 
the  residual  vectors  and  solution  approximations  as  you  iterate. 


/  6 
-1 
-1 
V  5 


f  6 

2 

1\ 

/  1\ 

/  6 

-1 

—3  \ 

/-n 

.  (b)  2 

3 

-1 

u  = 

0 

.  (c) 

-1 

7 

4 

u  = 

-2 

Vi 

-1 

2  J 

V-2  / 

1-3 

4 

97 

V  7/ 

1 

5  \ 

/  1\ 

/  5 

1 

1 

n 

74\ 

1 

-1 

2 

»  (e) 

1 

5 

1 

i 

0 

3 

-3 

u  = 

0 

1 

1 

5 

i 

u  = 

0 

3 

6J 

V-b 

u 

1 

1 

5/ 

Vo  J 

9.6.8.  Use  the  Conjugate  Gradient  Method  to  solve  the  system  in  Exercise  9.4.33.  How  many 
iterations  do  you  need  to  obtain  the  solution  that  is  accurate  to  2  decimal  places?  How 
does  this  compare  to  the  Jacobi  and  SOR  Methods? 


9.6.9.  According  to  Example  3.39,  the  n  x  n  Hilbert  matrix  Hn  is  positive  definite,  and  hence 
we  can  apply  the  Conjugate  Gradient  Method  to  solve  the  linear  system  Hn u  =  f.  For  the 
values  n  =  5, 10,  30,  let  u*  £  IRn  be  the  vector  with  all  entries  equal  to  1. 

(a)  Compute  f  =  Hn u*.  (b)  Use  Gaussian  Elimination  to  solve  Hn u  =  f.  How  close  is 
your  solution  to  u*?  (c)  Does  pivoting  improve  the  solution  in  part  (b)? 

(d)  Does  the  conjugate  gradient  algorithm  do  any  better? 

9.6.10.  Try  applying  the  Conjugate  Gradient  algorithm  to  the  system  —  x  A  2 y  -\-  z  =  —2, 
y  +  2z  =  1,  3x  A  y  —  z  =  1.  Do  you  obtain  the  solution?  Why  or  why  not? 


9.6.11.  True  or  false:  If  the  residual  vector  r  =  b  — Ax  satisfies 
the  true  solution  to  within  two  decimal  places. 


<  .01,  then  x  approximates 


0  9.6.12.  How  many  arithmetic  operations  are  needed  to  implement  one  iteration  of  the 

Conjugate  Gradient  Method?  How  many  iterations  can  you  perform  before  the  method 
becomes  more  work  than  direct  Gaussian  Elimination? 

Remark.  If  the  matrix  is  sparse,  the  number  of  operations  can  decrease  dramatically. 


0  9.6.13.  Fill  in  the  details  in  a  direct  derivation  of  the  Conjugate  Gradient  algorithm  following 
the  ideas  outlined  in  the  text:  starting  with  the  initial  guess  x0  and  corresponding  residual 
vector  w1  =  r0  =  b,  at  the  kth  step  in  the  algorithm,  given  the  approximation  xfc  and 
residual  rk  =  b  —  Axfc,  the  kth  conjugate  direction  is  chosen  so  that  wfc+1  =  rk  A  skwk 
satisfies  the  conjugacy  conditions  (9.113).  The  next  approximation  xfc+1  =  xfc  +  tfc+1wfc+1 
is  chosen  so  that  its  residual  rfc+1  =  b  —  Axfc+1  is  as  small  as  possible. 

0  9.6.14.  In  (9.120),  find  the  value  of  dk  that  minimizes  p(xfc+1). 
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4b  9.6.15.  Use  the  direct  gradient  descent  algorithm  (9.120)  using  the  value  of  dk  found  in 
Exercise  9.6.14  to  solve  the  linear  systems  in  Exercise  9.6.7.  Compare  the  speed  of 
convergence  with  that  of  the  Conjugate  Gradient  Method. 

4»  9.6.16.  Use  GMRES  to  solve  the  system  in  Exercise  9.4.33.  Compare  the  rate  of  convergence 
with  the  CG  algorithm  in  Exercise  9.6.8. 

X  9.6.17.  Is  GMRES  able  to  solve  the  system  in  Exercise  9.6.10? 

9.6.18.  Explain  in  what  sense  the  GMRES  approximation  xfc+1  of  order  k  +  1  is  a  better 
approximation  to  the  true  solution  than  that  of  order  k ,  namely  xfc. 

9.6.19.  (a)  Explain  what  happens  to  the  GMRES  algorithm  if  the  right-hand  side  b  of  the 
linear  system  Ax  =  b  is  an  eigenvector  of  A.  (b)  More  generally,  prove  that  if  the  Krylov 
subspaces  generated  by  b  stabilize  at  order  m,  then  the  solution  ot  the  linear  system  lies  in 

and  so  the  GMRES  algorithm  converges  to  the  solution  at  order  m. 


9.7  Wavelets 

Trigonometric  Fourier  series,  both  continuous  and  discrete,  are  amazingly  powerful,  but 
they  do  suffer  from  one  potentially  serious  defect.  The  complex  exponential  basis  functions 
e1  kx  —  cos  kx+  i  sin  kx  are  spread  out  over  the  entire  interval  [  —  tt,  tt ] ,  and  so  are  not  well 
suited  to  processing  localized  signals  —  meaning  data  that  are  concentrated  in  a  relatively 
small  regions.  Ideally,  one  would  like  to  construct  a  system  of  functions  that  is  orthogonal, 
and  so  has  all  the  advantages  of  the  Fourier  basis  functions,  but,  in  addition,  adapts  to 
localized  structures  in  signals.  This  dream  was  the  inspiration  for  the  development  of  the 
modern  theory  of  wavelets. 

The  Haar  Wavelets 


Although  the  modern  era  of  wavelets  started  in  the  mid  1980’s,  the  simplest  example  of  a 
wavelet  basis  was  discovered  by  the  Hungarian  mathematician  Alfred  Haar  in  1910,  [35  . 
We  consider  the  space  of  functions  (signals)  defined  the  interval  [0,1],  equipped  with  the 
standard  L2  inner  product 

(f,g)=[  f(x)g(x)dx.  (9.124) 

Jo 

The  usual  scaling  arguments  can  be  used  to  adapt  the  wavelet  formulas  to  any  other 
interval. 

The  Haar  wavelets  are  certain  piecewise  constant  functions.  The  initial  four  are  graphed 
in  Figure  9.6.  The  first  is  the  box  function 


V’lW  =  <p{x) 


1,  0  <  x  <  1, 

0,  otherwise, 


(9.125) 


known  as  the  scaling  function ,  for  reasons  that  shall  appear  shortly.  Although  we  are 
interested  in  the  value  of  <p(x)  only  on  the  interval  [0,1],  it  will  be  convenient  to  extend  it, 
and  all  the  other  wavelets,  to  be  zero  outside  the  basic  interval.  The  second  Haar  function 


tp2(x)  =  —  < 


1,  0  <  X  < 

1,  \  <  x  <  1, 

0,  otherwise, 


(9.126) 
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<p  iW 


<p2(x) 


<p3(x) 


Figure  9.6.  The  First  Four  Haar  Wavelets. 


¥>4(x) 


is  known  as  the  mother  wavelet.  The  third  and  fourth  Haar  functions  are  compressed 
versions  of  the  mother  wavelet: 


1, 


<P3(X)  =  w(2x)  =  {  -1, 


0, 


0  <  x  <  j, 


—  <T  T  — 

4  ^  ■-1,  —  2  ’ 


1 


1, 


2  ^  J  —  45 


y?4(x)  =  re(2x  -  1)  =  {  -1,  |  <  x  <  1 

0,  otherwise, 


called  daughter  wavelets.  One  can  easily  check,  by  direct  evaluation  of  the  integrals,  that 
the  four  Haar  wavelet  functions  are  orthogonal  with  respect  to  the  L2  inner  product  (9.124): 
( )  =  0  when  i  ^  j. 

The  scaling  transformation  x  2x  serves  to  compress  the  wavelet  function,  while  the 
translation  2x  2x  —  1  moves  the  compressed  version  to  the  right  by  a  half  a  unit. 
Furthermore,  we  can  represent  the  mother  wavelet  by  compressing  and  translating  the 
scaling  function: 

w{pc)  =  (p(2x)  —  (p(2x  —  1).  (9.127) 

It  is  these  two  operations  of  scaling  and  compression  —  coupled  with  the  all-important 
orthogonality  —  that  underlies  the  power  of  wavelets. 

The  Haar  wavelets  have  an  evident  discretization.  If  we  decompose  the  interval  (0, 1 
into  the  four  subintervals 


(l  II 
v  4  ’  2  J  ’ 


(l  31 
V  2  ’  4  J  ’ 


(3  X1 
V  4  ’  ^  J  ’ 


(9.128) 


on  which  the  four  wavelet  functions  are  constant,  then  we  can  represent  each  of  them  by 
a  vector  in  M4  whose  entries  are  the  values  of  each  wavelet  function  sampled  at  the  left 
endpoint  of  each  subinterval.  In  this  manner,  we  obtain  the  wavelet  sample  vectors 


(  l\ 

f  °\ 

1 

1 

-1 

0 

V1  = 

1 

>  V2  = 

-1 

»  V3  = 

0 

»  V4  = 

1 

V 1/ 

\-l  / 

\  0/ 

\-l  / 

(9.129) 


which  form  the  orthogonal  wavelet  basis  of  M4  we  first  encountered  in  Examples  2.35 
and  4.10.  Orthogonality  of  the  vectors  (9.129)  with  respect  to  the  standard  Euclidean  dot 
product  is  equivalent  to  orthogonality  of  the  Haar  wavelet  functions  with  respect  to  the 
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inner  product  (9.124).  Indeed,  if 

/( x)  ~  f  =  (A>A>A’A)  and  9{x)  ~  g  =  (9l,  92’ 93,9a) 

are  piecewise  constant  real  functions  that  achieve  the  indicated  values  on  the  four  subin¬ 
tervals  (9.128),  then  their  L2  inner  product 

(f  ,g)  =  /  f(x)  g(x)  dx  =  ±(f1g1+  f2g2+  f3g3  +  f4g4)  =  l  f  •  g, 

Jo 

is  equal  to  the  averaged  dot  product  of  their  sample  values  —  the  real  form  of  the  inner 
product  (5.104)  that  was  used  in  the  discrete  Fourier  transform. 

Since  the  vectors  (9.129)  form  an  orthogonal  basis  of  M4,  we  can  uniquely  decompose 
such  a  piecewise  constant  function  as  a  linear  combination  of  wavelets 

f(x)  =  c1ip1(x)  +  c2(f2{x)  +  C3ip3(x)  +  c4(f4(x), 

or,  equivalently,  in  terms  of  the  sample  vectors, 


f  =  C] lVjl  +  c9v9  +  cqvq  +  CAVA. 


The  required  coefficients 


ck  — 


"2  2 


[_d£k 

kfell2 


3  3 


"4  4 


f  •  V 


k 


k 


are  fixed  by  our  usual  orthogonality  formula  (4.7).  Explicitly, 

C1  =  \  (/l  +  /2  +  fs  +  /J?  C3  =  \  (/l  _  ^2) ’ 

c2  =  \  (A  +  A  -  A  -  A).  c4  =  5  (A  ~  A)- 

Before  proceeding  to  the  more  general  case,  let  us  introduce  an  important  analytical 
definition  that  quantifies  precisely  how  localized  a  function  is. 


Definition  9.52.  The  support  of  a  function  /(x),  written  supp  /,  is  the  closure  of  the  set 
where  f(x)  7^  0. 


Thus,  a  point  will  belong  to  the  support  of  /(#),  if  /  is  not  zero  there,  or  at  least  is  not 
zero  at  nearby  points.  More  precisely: 


Lemma  9.53.  If  /(a)  7^  0,  then  a  E  supp  /.  More  generally,  a  point  a  E  supp  /  if  and  only 
if  there  exists  a  convergent  sequence  xn  a  such  that  f(xn)  7^  0.  Conversely,  a  0  supp/ 
if  and  only  if  f(x)  =  0  on  an  interval  a  —  5  <  x  <  a  +  5  for  some  5  >  0. 


Intuitively,  the  smaller  the  support  of  a  function,  the  more  localized  it  is.  For  example, 
the  support  of  the  Haar  mother  wavelet  (9.126)  is  supp  re  =  [0,1]  —  the  point  x  —  0  is 
included,  even  though  re(0)  =  0,  because  w{x)  7^  0  at  nearby  points.  The  two  daughter 
wavelets  have  smaller  support: 


supp (^3  =  [0,i]  , 


supp  tp4  = 


5 


and  so  are  twice  as  localized. 

The  effect  of  scalings  and  translations  on  the  support  of  a  function  is  easily  discerned. 


Lemma  9.54.  If  supp  /  =  [a,  6],  and 


g{x)  =  f(rx  -  5), 


then 


supp  g  = 


a  T  $ 
r 


b  +  5 
r 
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In  other  words,  scaling  x  by  a  factor  r  compresses  the  support  of  the  function  by  a 
factor  1/r,  while  translating  x  translates  the  support  of  the  function. 

The  key  requirement  for  a  wavelet  basis  is  that  it  contains  functions  with  arbitrarily 
small  support.  To  this  end,  the  full  Haar  wavelet  basis  is  obtained  from  the  mother  wavelet 
by  iterating  the  scaling  and  translation  processes.  We  begin  with  the  scaling  function 


(9.130) 


from  which  we  construct  the  mother  wavelet  via  (9.127).  For  each  “generation”  j  >  0,  we 
form  the  wavelet  offspring  by  first  compressing  the  mother  wavelet  so  that  its  support  fits 
into  an  interval  of  length  2~J  , 


wj  o(x)  =  w(2°  x),  so  that  supp  w-  0  =  [0,  2  3 


(9.131) 


and  then  translating  w-  0  so  as  to  fill  up  the  entire  interval  [0, 1]  by  23  subintervals,  each 
of  length  2-J  ,  defining 


wj  k(x )  =  wj  o(x  —  k)  —  w(23  x  —  fc),  where  k  —  0, 1, . . . ,  2J  —  1.  (9.132) 

Lemma  9.54  implies  that  supp  Wj  k  =  [  2~3  k:  2~3  (fc  -|-1)],  and  so  the  combined  supports 


2j-l 


of  all  the  jth  generation  of  wavelets  is  the  entire  interval:  [J  supp  Wj  k  =  [0,1]  .  The 


k= o 


primal  generation,  j  —  0,  consists  of  just  the  mother  wavelet 

w0  Q(x)  —  w(x). 

The  hrst  generation,  j  —  1,  consists  of  the  two  daughter  wavelets  already  introduced  as  cp3 
and  <^4,  namely 


W1,0(X) 

The  second  generation,  j  =  2, 
w2,o(x)  =w(4x),  w21(x)-- 


=  w(2x),  w1  1(x)  —  w(2x  —  1). 

appends  four  additional  granddaughter  wavelets  to  our  basis: 
=  w( 4x  —  1),  w2  2(x)  —  w(4x  —  2),  w2  3(x)  —  4x  —  3). 


The  8  Haar  wavelets  </?,  w0  0,  wx  0,  wx  1,w23,w2  x,w22,w23  are  constant  on  the  8  subin¬ 
tervals  of  length  |,  taking  the  successive  sample  values  indicated  by  the  columns  of  the 
wavelet  matrix 


W8  = 


1 
1 
1 
1 
1 
W 


1 

1 

1 

1 

-1 

-1 

-1 

-1 


1 

1 

1 

1 

0 

0 

0 

0 


0 

0 

0 

0 

1 

1 

-1 

-1 


1 

-1 

0 

0 

0 

0 

0 

0 


0 

0 

1 

1 

0 

0 

0 

0 
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0 

0 

0 

1 

-1 

0 

0 


°\ 

0 

0 

0 

0 

0 

1 

1 J 


(9.133) 


Orthogonality  of  the  wavelets  is  manifested  in  the  orthogonality  of  the  columns  of  W8.  (Un¬ 
fortunately,  terminological  constraints  prevent  us  from  calling  W8  an  orthogonal  matrix, 
because  its  columns  are  not  orthonormal!) 

The  nth  stage  consists  of  2n+1  different  wavelet  functions  comprising  the  scaling  func¬ 
tions  and  all  the  generations  up  to  the  nth  :  w0(x)  =  <p(x)  and  w-  k(x)  for  0  <  j  <  n  and 
0  <  k  <  23 .  They  are  all  constant  on  each  subinterval  of  length  2~n_1. 


Theorem  9.55.  The  wavelet  functions  <p(x),  w-k{x)  form  an  orthogonal  system  with 
respect  to  the  inner  product  (9.124). 
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Proof :  First,  note  that  each  wavelet  w-  k{x)  is  equal  to  + 1  on  an  interval  of  length  2  3  1 
and  to  —  1  on  an  adjacent  interval  of  the  same  length.  Therefore, 

(wj,k^)=f  wj,k(x)dx  =  °>  (9.134) 

Jo 

since  the  + 1  and  —  1  contributions  cancel  each  other.  If  two  different  wavelets  w-  k  and 
wl  m  with,  say  j  <  /,  have  supports  that  are  either  disjoint,  or  just  overlap  at  a  single 
point,  then  their  product  w-  k(x)  wl  m(x)  =  0,  and  so  their  inner  product  is  clearly  zero: 

(Wj ,k’Wl,m)  =  f  wjk(x)wim(x)dx  ^0. 

Jo 


Otherwise,  except  in  the  case  when  the  two  wavelets  are  identical,  the  support  of  wl  is 
entirely  contained  in  an  interval  where  w-  k  is  constant,  and  so  w-  k(x)  wl  m(x)  =  =b  wl  m(x). 
Therefore,  by  (9.134), 

( Wj  k  ,  wl  m )  =  [  wj  k(x)wl  (x)dx  =  ±  f  wl  m{x)dx  =  0. 

Jo  Jo 


Finally,  we  compute 


w 


3,k 


(. x )2  dx  —  2 


-3 


(9.135) 


The  second  formula  follows  from  the  fact  that 
and  is  0  elsewhere. 


1  on  an  interval  of  length  2  3 

Q.E.D. 


The  wavelet  series  of  a  signal  f(x)  is  given  by 

(X)  2j -1 

fix)  -  c0ip(x)  +  Yi  Yi 

j  =  0  k  =  0 


(9.136) 


Orthogonality  implies  that  the  wavelet  coefficients  c0 ,  c-  k  can  be  immediately  computed 
using  the  standard  inner  product  formula  coupled  with  (9.135): 


co  = 


Cj,k 


f,<p) 


\<p 

f  >  wj,k 


=  /  f(x)  dx, 


wj,k 


0 

—  23  /  f(x)  dx 

J2~ok 


‘2~j(k+l) 

23  I  f(x)  dx. 

2-ok+2~o-1 


(9.137) 


The  convergence  properties  of  the  Haar  wavelet  series  (9.136)  are  similar  to  those  of  Fourier 
series,  [61,  77];  full  details  can  be  found  [18,  88  . 


Example  9.56.  In  Figure  9.7,  we  plot  the  Haar  expansions  of  the  signal  displayed  in 

the  first  plot.  The  following  plots  show  the  partial  sums  for  the  Haar  wavelet  series  (9.136) 
over  j  =  0, . . . ,  r  with  r  =  2,  3, 4,  5,  6.  Since  the  wavelets  are  themselves  discontinuous, 
they  do  not  have  any  difficulty  converging  to  a  discontinuous  function.  On  the  other  hand, 
it  takes  quite  a  few  wavelets  to  begin  to  accurately  reproduce  the  signal  —  in  the  last  plot, 
we  are  combining  a  total  of  26  =  64  Haar  wavelets. 
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Figure  9.7.  Haar  Wavelet  Expansion. 


Exercises 


4b  9.7.1.  Let  /(x)  =  x.  (a)  Determine  its  Haar  wavelet  coefficients  c  ■  k.  (b)  Graph  the  partial 

sums  sr(x)  of  the  Haar  wavelet  series  (9.136)  where  j  goes  from  0  to  r  =  2,  5,  and 
10.  Compare  your  graphs  with  that  of  /  and  discuss  what  you  observe.  Is  the  series 

converging  to  the  function?  Can  you  prove  this?  (c)  What  is  the  maximal  deviation 


/ 


r 


oo 


=  max{  |  /(x)  —  sr(x)  \  |  0  <  x  <  1  }  for  each  of  your  partial  sums? 


4b  9.7.2.  Answer  Exercise  9.7.1  for  the  functions 

r\ 

(a)  x  —  x ,  (b)  cos  7rx,  (c) 


—  X 


—  x 


0  <  x  < 

^  <  x  <  1. 


0  <  x  <  ^7 r, 

3  7 r  <  X  <  g7 r. 


9.7.3.  In  this  exercise,  we  investigate  the  compression  capabilities  of  the  Haar  wavelets.  Let 

—x, 

f(x)  =  x  —  1 7 r,  ^7r  <  x  <  |7r,  represent  a  signal  defined  on  0  <  x  <  1.  Let 
,  —X  +  27 r,  |7T  <  X  <  27T, 

sr(x)  denote  the  nth  partial  sum,  from  j  =  0  to  r,  of  the  Haar  wavelet  series  (9.136). 

(a)  How  many  different  Haar  wavelet  coefficients  Cj  k  appear  in  sr(x)?  If  our  criterion 
for  compression  is  that  ||  /  —  sr  <  s,  how  large  do  you  need  to  choose  r  when  e  =  .1? 

e  =  .01?  £  =  .001?  (b)  Compare  the  Haar  wavelet  compression  with  the  discrete  Fourier 

method  of  Exercise  5.6.10. 


C  9.7.4.  (a)  Explain  why  the  wavelet  expansion  (9.136)  defines  a  linear  transformation  on  IRn 

T 

that  takes  a  wavelet  coefficient  vector  c  =  (  c0,  c1? . . . ,  cn_1  )  to  the  corresponding  sample 

T 

vector  f  =  (/0,  /1? . . . ,  fn_i  )  .  (b)  According  to  Theorem  7.5,  the  wavelet  map  must  be 

given  by  matrix  multiplication  f  =  Wn  c  by  a  2  x  2n  matrix  W  =  Wn.  Construct  W2,  W3 
and  W4.  (c)  Prove  that  the  columns  of  Wn  are  obtained  as  the  values  of  the  wavelet  basis 
functions  on  the  2n  sample  intervals,  (d)  Prove  that  the  columns  of  Wn  are  orthogonal. 

(e)  Is  Wn  an  orthogonal  matrix?  Find  a  formula  for  W~l .  (f)  Explain  why  the  wavelet 
transform  is  given  by  the  linear  map,  c  =  f. 
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4b  9.7.5.  Test  the  noise  removal  features  of  the  Haar  wavelets  by  adding  random  noise  to  one  of 
the  functions  in  Exercises  9.7.1  and  9.7.2,  computing  the  wavelet  series,  and  then  setting 
the  high  “frequency”  modes  to  zero.  What  do  you  observe?  Is  this  a  reasonable  denoising 
algorithm  when  compared  with  a  Fourier  method? 

9.7.6.  Write  the  Haar  scaling  function  and  mother  wavelet  as  linear  combinations  of  step 
functions. 

0  9.7.7.  Prove  Lemma  9.54. 


Modern  Wavelets 

The  main  defect  of  the  Haar  wavelets  is  that  they  do  not  provide  a  very  efficient  means 
of  representing  even  very  simple  functions  —  it  takes  quite  a  large  number  of  wavelets 
to  reproduce  signals  with  any  degree  of  precision.  The  reason  for  this  is  that  the  Haar 
wavelets  are  piecewise  constant,  and  so  even  an  affine  function  y  —  ax  +  /?  requires  many 
sample  values,  and  hence  a  relatively  extensive  collection  of  Haar  wavelets,  to  be  accurately 
reproduced.  In  particular,  compression  and  denoising  algorithms  based  on  Haar  wavelets 
are  either  insufficiently  precise  or  hopelessly  inefficient,  and  hence  of  minor  practical  value. 

For  a  long  time  it  was  thought  that  it  was  impossible  to  simultaneously  achieve  the 
requirements  of  localization,  orthogonality  and  accurate  reproduction  of  simple  functions. 
The  breakthrough  came  in  1988,  when  the  Dutch  mathematician  Ingrid  Daubechies  pro¬ 
duced  the  first  examples  of  wavelet  bases  that  realized  all  three  basic  criteria.  Since  then, 
wavelets  have  developed  into  a  sophisticated  and  burgeoning  industry  with  major  impact 
on  modern  technology.  Significant  applications  include  compression,  storage  and  recogni¬ 
tion  of  fingerprints  in  the  FBTs  data  base,  and  the  JPEG2000  image  format,  which,  unlike 
earlier  Fourier-based  JPEG  standards,  incorporates  wavelet  technology  in  its  image  com¬ 
pression  and  reconstruction  algorithms.  In  this  section,  we  will  present  a  brief  outline  of 
the  basic  ideas  underlying  Daubechies’  remarkable  construction. 

The  recipe  for  any  wavelet  system  involves  two  basic  ingredients  —  a  scaling  function 
and  a  mother  wavelet.  The  latter  can  be  constructed  from  the  scaling  function  by  a 
prescription  similar  to  that  in  (9.127),  and  therefore  we  first  concentrate  on  the  properties 
of  the  scaling  function.  The  key  requirement  is  that  the  scaling  function  must  solve  a 
dilation  equation  of  the  form 

p 

(p(oc)  —  ek  ^(2oc  —  k)  —  e0  (p(2x) ex  (p(2x  —  1) •••  +cp(p(2x—  p)  (9.138) 
k  =  o 


for  some  collection  of  constants  c0, . . . ,  c  .  The  dilation  equation  relates  the  function  <p(x) 
to  a  finite  linear  combination  of  its  compressed  translates.  The  coefficients  c0, . . . ,  c  are 
not  arbitrary,  since  the  properties  of  orthogonality  and  localization  will  impose  certain 
rather  stringent  requirements. 

Example  9.57.  The  Haar  or  box  scaling  function  (9.125)  satisfies  the  dilation  equation 
(9.138)  with  c0  —  cx  —  1,  namely 

(p(x)  =  (p(2x)  +  ip{2x  —  1).  (9.139) 

We  recommend  that  you  convince  yourself  of  the  validity  of  this  identity  before  continuing. 
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Example  9.58.  Another  example  of  a  scaling  function  is  the  hat  function 

{; X ,  0  <  X  <  1, 

2  —  x,  1  <  £  <  2,  (9.140) 

0,  otherwise, 

graphed  in  Figure  9.8.  The  hat  function  satisfies  the  dilation  equation 

(p(x)  =  \  cp(2x)  +  ip(2x  —  1)  +  \  (f(2x  —  2),  (9.141) 

which  is  (9.138)  with  c0  =  c1  =  1,  c2  =  Again,  the  reader  should  be  able  to  check 
this  identity  by  hand. 


The  dilation  equation  (9.138)  is  a  kind  of  functional  equation ,  and,  as  such,  is  not  so 
easy  to  solve.  Indeed,  the  mathematics  of  functional  equations  remains  much  less  well 
developed  than  that  of  differential  equations  or  integral  equations.  Even  to  prove  that 
(nonzero)  solutions  exist  is  a  nontrivial  analytical  problem.  Since  we  already  know  two 
explicit  examples,  let  us  defer  the  discussion  of  solution  techniques  until  we  understand 
how  the  dilation  equation  can  be  used  to  construct  a  wavelet  basis. 

Given  a  solution  to  the  dilation  equation,  we  define  the  mother  wavelet  to  be 


w  l 


(9.142) 


m  =  E  (“Ts-fc  n2x  - k ) 

k  =  0 

=  cp(p(2x)  -  cp^1if(2x  -  1)  +  cp_2tp(2x  -  2)  +  •••  ±c0ip(2x  -  p). 

This  formula  directly  generalizes  the  Haar  wavelet  relation  (9.127),  in  light  of  its  dilation 
equation  (9.139).  The  daughter  wavelets  are  then  all  found,  as  in  the  Haar  basis,  by 
iteratively  compressing  and  translating  the  mother  wavelet: 

wj  k(x)  =  w( 2^  x  —  k).  (9.143) 

In  the  general  framework,  we  do  not  necessarily  restrict  our  attention  to  the  interval  [0,1], 
and  so  j  and  k  can,  in  principle,  be  arbitrary  integers. 

Let  us  investigate  what  sort  of  conditions  should  be  imposed  on  the  dilation  coefficients 
c0, . . . ,  c  in  order  that  we  obtain  a  viable  wavelet  basis  by  this  construction.  First,  lo¬ 
calization  of  the  wavelets  requires  that  the  scaling  function  have  bounded  support,  and 
so  cp(x)  =  0  when  x  lies  outside  some  bounded  interval  [a,  b].  Integrating  both  sides  of 
(9.138)  produces 

■b  poo  r°° 

k)  dx. 

k  =  0 


po  poo  y  poo 

/  <p(x)  dx  —  /  ip(x)  dx  —  ck  /  ip(2x 

J  d  J  —  OO  7 '  r\  'J  —  CXD 


(9.144) 


Performing  the  change  of  variables  y  =  2x  —  k,  with  dx  =  ^  dy,  we  obtain 


/OO  n  OO  -y  nO 

<p(2x-k)dx=-  <p(y)  dy  =  -  <p(x)dx , 

-oo  ^  J  —  oo  ^  J  a 


(9.145) 


9.7  Wavelets 


557 


where  we  revert  to  x  as  our  (dummy)  integration  variable.  We  substitute  this  result  back 

rb 

into  (9.144).  Assuming  that  /  <p(x)  dx  ^  0,  we  discover  that  the  dilation  coefficients  must 

J  a 

satisfy 

Cq  4-  •  •  •  H-  0^  =  2.  (9.146) 

The  second  condition  we  require  is  orthogonality  of  the  wavelets.  For  simplicity,  we 
only  consider  the  standard  L2  inner  product^ 

/oo 

f(x)  g(x)  dx. 

-oo 

It  turns  out  that  the  orthogonality  of  the  complete  wavelet  system  is  guaranteed  once  we 
know  that  the  scaling  function  p>(x)  is  orthogonal  to  all  its  integer  translates: 

/oo 

ip(x)  ip(x  —  m)  dx  =  0  for  all  m^O.  (9.147) 

-oo 

We  first  note  the  formula 


/oo 

-oo 


(9.148) 


1 

2 


/ 


oo 


follows  from  the  same  change  of  variables  y  —  2x  —  k  used  in  (9.145).  Therefore,  since  (p 
satisfies  the  dilation  equation  (9.138),  we  have 


v 


v 


( ip{x) ,  <p(x  -  m) )  =  (  Y  Cjippx-j),  Y  ckf{2x  -2m-  k) 

3=0 


(9.149) 


k  =  0 


P 


1 


P 


=  Y  cjck(v{^x-j),(p(2x-2m-k))  =  -  Y  cj  ck  {<P(X) ,  <p(x  +  j  -  2m  -  k) ). 
j,k  =  0  j,k  =  0 

If  we  require  orthogonality  (9.147)  of  all  the  integer  translates  of  then  the  left-hand  side 
of  this  identity  will  be  0  unless  m  —  0,  while  only  the  summands  with  j  —  2  m  +  k  will  be 
nonzero  on  the  right.  Therefore,  orthogonality  requires  that 

\  ~\  f  2,  TYl  —  0, 

/  j  ^m+k  C k 

0  <  k  <  p—2m 


0,  m/0. 


(9.150) 


The  algebraic  equations  (9.146, 150)  for  the  dilation  coefficients  are  the  key  requirements 
for  the  construction  of  an  orthogonal  wavelet  basis. 

For  example,  if  we  have  just  two  nonzero  coefficients  c0,  c1?  then  (9.146, 150)  reduce  to 

co  +  ci  =  co  +  ci  =  2, 

and  so  c0  =  cx  =  1  is  the  only  solution,  resulting  in  the  Haar  dilation  equation  (9.139).  If 
we  have  three  coefficients  c0:c1:c2 ,  then  (9.146),  (9.150)  require 


C0  +  C1  +  C2  -  C0  +  C1  +  C2  —  2, 


c0  c2  =  0. 


^  In  all  instances,  the  functions  have  bounded  support,  and  so  the  inner  product  integral  can 
be  reduced  to  an  integral  over  a  finite  interval  where  both  /  and  g  are  nonzero. 
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Thus  either  c2  =  0,  c0  =  cx  =  1,  and  we  are  back  to  the  Haar  case,  or  c0  =  0,  cx  =  c2  =  1, 
and  the  resulting  dilation  equation  is  a  simple  reformulation  of  the  Haar  case.  In  particular, 
the  hat  function  (9.140)  does  not  give  rise  to  orthogonal  wavelets. 

The  remarkable  fact,  discovered  by  Daubechies,  is  that  there  is  a  nontrivial  solution 
for  four  (and,  indeed,  any  even  number)  of  nonzero  coefficients  c^c^c^Cg.  The  basic 
equations  (9.146),  (9.150)  require 

Cq  T  cq  T  c2  T  Cg  —  2,  Cq  T  cq  T  c2  T  Cg  —  2,  Cq  c2  T  cq  Cg  —  0.  (9.151) 

The  particular  values 

1  +  ^3  3  +  >/3  3  —  \/3  1- V3 

C0  ~  ^  ’  C1  —  ^  ?  c2  —  ^  ’  C3  —  ^  ’ 

solve  (9.151).  These  coefficients  correspond  to  the  Daubechies  dilation  equation 

,  N  1  +  a/3  ,  N  3  +  a/3  ,  .  3  —  a/3  .  .  1  —  a/3  ,  n 

<p{x)  =  — ^ —  <p(2x)  +  — - —  ip(2x  -  1)  +  — - —  cp(2x  -  2)  +  — - —  <p(2x  -  3). 

(9.153) 

A  nonzero  solution  of  bounded  support  to  this  remarkable  functional  equation  will  give 
rise  to  a  scaling  function  (p(x),  a  mother  wavelet 

1  —  a/3  f  A  3  —  a/3  (  A  3  +  a/3  ,  A  1  +  a/3  ^  oA 

=  — - —  tp(2x) - - —  (p{2x  -  1)  +  — - —  <p{2x  -  2) - - —  ip(2x  -  3), 

(9.154) 

and  then,  by  compression  and  translation  (9.143),  the  complete  system  of  orthogonal 
wavelets  wjk(x). 

Before  explaining  how  to  solve  the  Daubechies  dilation  equation,  let  us  complete  the 
proof  of  orthogonality.  It  is  easy  to  see  that,  by  translation  invariance,  since  cp(x)  and 
Lp(x  —  m)  are  orthogonal  whenever  m  /  0,  so  are  ip(x  —  k )  and  cp(x  —  l )  for  all  k  ^  l.  Next 
we  prove  orthogonality  of  cp(x  —  m)  and  w(x): 

/p  p 

( w{x) ,  tp(x  —  m) )  —  E  (-l)3+l  c3<p(2x-  1  +j),  E  ck  (p(2x  —  2m  —  k) 

\  j  =  0  k  =  0 

p 

=  N  (-1  )0+1cjck{y(2x-l+j),Lp(2x-2m-k)) 
j,k  =  0 

1  P 

=  2  E  (-1)J+1  Cj  ck  (tp(x),tp(x  -l+j-2m-k)), 
j,k  =  0 


using  (9.148).  By  orthogonality  (9.147)  of  the  translates  of  (/?,  the  only  summands  that  are 
nonzero  are  those  for  which  j  =  2m  +  fc  +  l;  the  resulting  coefficient  of  ||  <p(x)  1 1 2  is 


T  (-E 


b-2m-fe 


where  the  sum  is  over  all  0  <  k  <  p  such  that  0  <  1  —  2  m  —  k  <  p.  Each  term  in  the  sum 
appears  twice,  with  opposite  signs,  and  hence  the  result  is  always  zero  —  no  matter  what 
the  coefficients  c0, . . . ,  c  are!  The  proof  of  orthogonality  of  the  translates  w(x  —  m)  of 

the  mother  wavelet,  along  with  all  her  wavelet  descendants  w( 2^  x  —  fc),  relies  on  a  similar 
argument,  and  the  details  are  left  as  an  exercise  for  the  reader. 
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Approximating  the  Daubechies  Wavelet. 


Solving  the  Dilation  Equation 


Let  us  next  discuss  how  to  solve  the  dilation  equation  (9.138).  The  solution  we  are  after 
does  not  have  an  elementary  formula,  and  we  require  a  slightly  sophisticated  approach  to 
recover  it.  The  key  observation  is  that  (9.138)  has  the  form  of  a  fixed  point  equation 

‘P  =  F[p], 


not  in  ordinary  Euclidean  space,  but  in  an  infinite-dimensional  function  space.  With  luck, 
the  fixed  point  (or,  more  correctly,  fixed  function)  will  be  stable,  and  so  starting  with  a 
suitable  initial  guess  <£?0(x),  the  successive  iterates 

fn+l  =  FWn\ 

will  converge  to  the  desired  solution:  (pn(x )  — >  <£>(#).  In  detail,  the  iterative  version  of 
the  dilation  equation  (9.138)  reads 


p 

<pn+i{x)  =  ck(fin(2x-  k),  n  =  0,1,2,....  (9.155) 

k  =  0 


Before  attempting  to  prove  convergence  of  this  iterative  procedure  to  the  Daubechies  scaling 
function,  let  us  experimentally  investigate  what  happens. 

A  reasonable  choice  for  the  initial  guess  might  be  the  Haar  scaling  or  box  function 


LP0(X) 


1,  0  <  t  <  1. 

0,  otherwise. 


In  Figure  9.9  we  graph  the  subsequent  iterates  (p1(x),  (p2(x),  (p7(x).  There 

clearly  appears  to  be  convergence  to  some  function  (p(x),  although  the  final  result  looks 
a  little  bizarre.  Bolstered  by  this  preliminary  experimental  evidence,  we  can  now  try  to 
prove  convergence  of  the  iterative  scheme.  This  turns  out  to  be  true;  a  fully  rigorous  proof 
relies  on  the  Fourier  transform,  and  can  be  found  in  [18 


Theorem  9.59.  The  functions  ipn(x)  defined  by  the  iterative  functional  equation  (9.155) 
converge  uniformly  to  a  continuous  function  called  the  Daubechies  scaling  function. 


Once  we  have  established  convergence,  we  are  now  able  to  verify  that  the  scaling  function 
and  consequential  system  of  wavelets  form  an  orthogonal  system  of  functions. 
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Proposition  9.60.  All  integer  translates  <p(x  —  fc),  for  k  G  Z  of  the  Daubechies  scaling 
function,  and  all  wavelets  w-  k{x)  =  w(2^  x  —  k),  j  >  0,  are  mutually  orthogonal  functions 
with  respect  to  the  L2  inner  product.  Moreover,  ||  ip  ||2  —  1,  while  ||  w-  k  ||2  =  2~P 

Proof :  As  noted  earlier,  the  orthogonality  of  the  entire  wavelet  system  will  follow  once  we 
know  the  orthogonality  (9.147)  of  the  scaling  function  and  its  integer  translates.  We  use 
induction  to  prove  that  this  holds  for  all  the  iterates  pn(x),  and  so,  in  view  of  uniform 
convergence,  the  limiting  scaling  function  also  satisfies  this  property.  We  already  know 
that  the  orthogonality  property  holds  for  the  Haar  scaling  function  p0(x).  To  demonstrate 
the  induction  step,  we  repeat  the  computation  in  (9.149),  but  now  the  left-hand  side  is 
( pn+i{x) ,  —  m) ),  while  all  other  terms  involve  the  previous  iterate  pn.  In  view  of 

the  the  algebraic  constraints  (9.150)  on  the  wavelet  coefficients  and  the  induction  hypoth¬ 
esis,  we  deduce  that  (pn+i(x) ,  <£n+ \{x  —  m) )  =0  whenever  m  ^  0,  while  when  m  —  0, 

|  Pn+i  || 2  =  ||  pn  || 2 •  Since  ||  cp0  ||  =  1,  we  further  conclude  that  all  the  iterates,  and  hence 
the  limiting  scaling  function,  all  have  unit  L2  norm.  The  proof  of  the  formula  for  the  norms 


of  the  mother  and  daughter  wavelets  is  left  for  Exercise  9.7.19. 


Q.E.D. 


In  practical  computations,  the  limiting  procedure  for  constructing  the  scaling  function 
is  not  so  convenient,  and  an  alternative  means  of  computing  its  values  is  employed.  The 
starting  point  is  to  determine  its  values  at  integer  points.  First,  the  initial  box  function 
has  values  ip0  (m)  =  0  for  all  integers  m  E  Z  except  </?0(l)  =  1.  The  iterative  functional 
equation  (9.155)  will  then  produce  the  values  of  the  iterates  pn(m)  at  integer  points  mn  E  Z. 
A  simple  induction  will  convince  you  that  ipn{m)  =  0  except  for  mn  =  1  and  m  —  2,  and, 
therefore,  by  (9.155), 


¥Vn(l)  =  3  7  ^  <Pn( !)  +  1  ^n(2)> 


.  .  1  —  \/3  ,  .  3  —  y/3  ,  . 

<Pn+i(2)  =  - 7 - (Pu(1)  + - ; - ¥>„(2), 


4  rnx  I  ■  4  rnv/7  rn- 1-1  V  /  4 

since  all  other  terms  are  0.  This  has  the  form  of  a  linear  iterative  system 

v(n+1)  =  Av(n) 

with  coefficient  matrix 

/3+V3  1  +  V3\ 

A  = 


(9.156) 


4  4 

1  -  3  —  y/3 


and  where 


VH  = 


As  we  know,  the  solution  to  such  an  iterative  system  is  specified  by  the  eigenvalues  and 
eigenvectors  of  the  coefficient  matrix,  which  are 


=  1, 


\  _  i 

A2  9  5 


We  write  the  initial  condition  as  a  linear  combination  of  the  eigenvectors 

v<o,_(Vo(in_m_2v  _i^nv 

Uo(2)/  v»J  ■  2 

The  solution  is 

v(«)  =  A"v( 0)  =2  Anv1  -  1  ~  ^  An\2  =  2vj  -  —  1  ~  ^  V. 
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Figure  9.10.  The  Daubechies  Scaling  Function  and  Mother  Wavelet. 


The  limiting  vector 


^6)  ]  =  iim  v(n)  =  2v,  = 

(f{2)  '  n->oc 


1  +  V3\ 
2 

1  -  V3 
\  2  > 


gives  the  desired  values  of  the  scaling  function: 


,  ,  i  +  Vs 

<p{  1)  =  o  =  1.366025  . . .  , 


_  /o 

<p( 2)  =  — =  -  .366025 


(9.157) 


Lp(mn)  —  0,  for  all  m/1,2, 


With  this  in  hand,  the  Daubechies  dilation  equation  (9.153)  then  prescribes  the  function 
values  (/?(  ^  nn )  at  all  half-integers,  because  if  x  =  \  m1  then  2x  —  k  =  m  —  A:  is  an  integer. 
Once  we  know  its  values  at  the  half-integers,  we  can  reuse  equation  (9.153)  to  give  its 
values  at  quarter-integers  \  m.  Continuing  onward,  we  determine  the  values  of  <p(x)  at  all 
dyadic  points ,  meaning  rational  numbers  of  the  form  x  =  m/2-7  for  m,  j  E  Z.  Continuity 
will  then  prescribe  its  value  at  all  other  xGl  since  x  can  be  written  as  the  limit  of  dyadic 
numbers  xn  —  namely  those  obtained  by  truncating  its  binary  (base  2)  expansion  at  the 
nth  digit  beyond  the  decimal  (or,  rather  “binary”)  point.  But,  in  practice,  this  latter  step 
is  unnecessary,  since  all  computers  are  ultimately  based  on  the  binary  number  system,  and 
so  only  dyadic  numbers  actually  reside  in  a  computer’s  memory.  Thus,  there  is  no  real 
need  to  determine  the  value  of  cp  at  non-dyadic  points. 

The  preceding  scheme  was  used  to  produce  the  graphs  of  the  Daubechies  scaling  func¬ 
tion  in  Figure  9.10.  It  is  a  continuous,  but  non-differentiable,  function  —  and  its  graph 
has  a  very  jagged,  fractal-like  appearance  when  viewed  at  close  range.  The  Daubechies 
scaling  function  is,  in  fact,  a  close  relative  of  the  famous  example  of  a  continuous,  nowhere 
differentiable  function  originally  due  to  Weierstrass,  [42,  53],  whose  construction  also  relies 
on  a  similar  scaling  argument. 

Given  the  values  of  the  Daubechies  scaling  function  on  a  sufficiently  dense  set  of  dyadic 
points,  the  consequential  values  of  the  mother  wavelet  are  given  by  formula  (9.154).  Note 
that  supp (p  =  suppre  =  [0,3].  The  daughter  wavelets  are  then  found  by  the  usual  com¬ 
pression  and  translation  procedure  (9.143). 
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Figure  9.11.  Daubechies  Wavelet  Expansion. 


The  Daubechies  wavelet  expansion  of  a  function  whose  support  is  contained  uA  [0, 1 
is  then  given  by 

oo  2j-l 

f(x)  ~  C0tp(x)  +  E  E  h*  wjk{x).  (9.158) 

j  =  0  k=— 2 

The  inner  summation  begins  at  k  =  —  2  so  as  to  include  all  the  wavelet  offspring  w-k  whose 
supports  have  a  nontrivial  intersection  with  the  interval  [0,1].  The  wavelet  coefficients 
c0,  c  ■  k  are  computed  by  the  usual  orthogonality  formula 


co  =  (/>¥’>=/  f(x)<p(x)dx, 

Jo 

p2~j  (fc+3) 

S'.fe  =  < /  ’  wi,fc )  =  2j 


2~o  k 


f(pc)  w-  k(x)  dx  —  /  /( 2  J  (x  +  k) )  w(x)  dx, 

Jo 


(9.159) 


where  we  agree  that  f(x)  =  0  whenever  x  <  0  or  x  >  1.  In  practice,  one  employs 
a  numerical  integration  procedure,  e.g.,  the  trapezoid  rule,  based  on  dyadic  points  to 
speedily  evaluate  the  integrals  (9.159).  A  proof  of  completeness  of  the  resulting  wavelet 
basis  functions  can  be  found  in  [18].  Compression  and  denoising  algorithms  based  on 
retaining  only  low-frequency  modes  proceed  as  in  Section  5.6,  and  are  left  as  projects  for 
the  motivated  reader  to  implement. 


Example  9.61.  In  Figure  9.11,  we  plot  the  Daubechies  wavelet  expansions  of  the  same 

signal  for  Example  9.56.  The  first  plot  is  the  original  signal,  and  the  following  show  the 
partial  sums  of  (9.158)  over  j  =  0, . . . ,  r  with  r  =  2,  3,  4,  5,  6.  Unlike  the  Haar  expansion, 
the  Daubechies  wavelets  do  exhibit  a  nonuniform  Gibbs  phenomenon,  where  the  expansion 
noticeably  overshoots  near  the  discontinuity,  [61],  which  can  be  observed  at  the  interior 
discontinuity  as  well  as  the  endpoints,  since  the  function  is  set  to  0  outside  the  interval 


1  For  functions  with  larger  support,  one  should  include  additional  terms  in  the  expansion  cor¬ 
responding  to  further  translates  of  the  wavelets  so  as  to  cover  the  entire  support  of  the  function. 
Alternatively,  one  can  translate  and  rescale  x  to  fit  the  function’s  support  inside  [0, 1]. 


9.7  Wavelets 


563 


[0,1].  Indeed,  the  Daubechies  wavelets  are  continuous,  and  so  cannot  converge  uniformly 
to  a  discontinuous  function. 


Exercises 


4b  9.7.8.  Answer  Exercises  9.7.1  and  9.7.2  using  the  Daubechies  wavelets  instead  of  the  Haar 
wavelets.  Do  you  see  any  improvement  in  your  approximations?  Discuss  the  advantages 
and  disadvantages  of  both  in  light  of  these  examples. 


4b  9.7.9.  Answer  Exercise  9.7.3  using  the  Daubechies  wavelets  to  compress  the  data.  Compare 
your  results. 

0  9.7.10.  Verify  formulas  (9.139)  and  (9.141). 

9.7.11.  Prove  that  the  most  general  solution  to  the  functional  equation  (p(x)  =  2cp(2x)  is 
<p(x)  =  /( log2  x)/x  where  f(z  +  1)  =  f(z)  is  any  1  periodic  function. 

0  9.7.12.  Consider  the  dilation  equation  (9.138)  with  c0  =  0,  c1  =  c2  =  1,  so  <p(x)  = 

<p( 2x  —  1)  +  (p( 2x  —  2).  Prove  that  ^(x)  =  <£>(x  +  1)  satisfies  the  Haar  dilation  equation 

(9.139).  Generalize  this  result  to  prove  that  we  can  always,  without  loss  of  generality, 
assume  that  c0  /  0  in  the  general  dilation  equation  (9.138). 

9.7.13.  Prove  that  a  cubic  B  spline,  as  defined  in  Exercise  5.5.76,  solves  the  dilation 

1  IQ 

equation  (9.138)  for  c0  =  c4  =  g,  c1  =  c3  =  2,  c2  =  f . 

9.7.14.  Explain  why  the  scaling  function  <p(x)  and  the  mother  wavelet  w(x)  have  the  same 
support:  supp  <p  =  supp  w. 


9.7.15.  Prove  that  (9.147)  implies  ( (p(x  —  l )  ,  (p(x  —  m) )  =  0  for  all  l  7^  m. 


0  9.7.16.  Let  <p(x)  be  any  scaling  function,  w(pc)  the  corresponding  mother  wavelet  and  w-  k(pc) 


the  wavelet  descendants.  Prove  that  (a)  ||(/?||  = 


w 


(b) 


w 


j,k 


=  2 


3 


¥ 


0  9.7.17.  (a)  Prove  that  the  scaling  function  ip(x)  and  the  mother  wavelet  w(x)  are  orthogonal, 
(b)  Prove  that  the  integer  translates  w(x  —  m)  of  the  mother  wavelet  are  mutually 
orthogonal,  (c)  Prove  orthogonality  of  all  the  wavelet  offspring  Wj  fc(x). 


9.7.18.  Find  the  values  of  the  Daubechies  scaling  function  ip(x)  and  mother  wavelet  w(x)  at 
x=  (a)  i,  (b)  i,  (c) 

0  9.7.19.  Prove  the  formulas  in  Proposition  9.60  for  the  norms  of  the  mother  and  daughter 
wavelets. 


4*  9.7.20.  Write  a  computer  program  to  zoom  in  on  the  Daubechies  scaling  function  and  discuss 
what  you  see. 

9.7.21.  True  or  false:  The  iterative  system  (9.156)  is  a  Markov  process. 

0  9.7.22.  Let  <p(x)  satisfy  the  Daubechies  scaling  equation  (9.153).  Prove  that  if  ip(i)  /  0  for  any 
i  <  0  or  i  >  p,  then  supp  is  unbounded. 

9.7.23.  (a)  Use  (9.142)  to  construct  the  “mother  wavelet”  corresponding  to  the  hat  function 
(9.140).  (b)  Is  the  hat  function  orthogonal  to  the  mother  wavelet?  (c)  Is  the  hat  function 
orthogonal  to  its  integer  translates? 

9.7.24.  Prove  that  a  real  number  x  is  dyadic  if  and  only  if  its  binary  (base  2)  expansion 
terminates,  i.e.,  is  eventually  all  zeros. 

_ o 

9.7.25.  Find  dyadic  approximations,  with  error  at  most  2  ,  to 

(a)  (b)  jj,  (c)  \/2,  (d)  e,  (e)  n. 


® 

Check  for 
updates 

Chapter  10 
Dynamics 


In  this  chapter,  we  turn  our  attention  to  continuous  dynamical  systems,  which  are  governed 
by  first  and  second  order  linear  systems  of  ordinary  differential  equations.  Such  systems, 
whose  unvarying  equilibria  were  the  subject  of  Chapter  6,  include  the  dynamical  motions 
of  mass-spring  chains  and  structures,  and  the  time- varying  voltages  and  currents  in  simple 
electrical  circuits.  Dynamics  of  continuous  media,  including  fluids,  solids,  and  gases,  are 
modeled  by  infinite-dimensional  dynamical  systems  described  by  partial  differential  equa¬ 
tions,  [61,79],  and  will  not  be  treated  in  this  text,  nor  will  we  venture  into  the  vastly  more 
complicated  realm  of  nonlinear  dynamics,  [34,  41  . 

Chapter  8  developed  the  basic  mathematical  tools  —  eigenvalues  and  eigenvectors 
used  in  the  analysis  of  linear  systems  of  ordinary  differential  equations.  For  a  first  order 
system,  the  resulting  eigensolutions  describe  the  basic  modes  of  exponential  growth,  decay, 
or  periodic  behavior.  In  particular,  the  stability  properties  of  an  equilibrium  solution  are 
(mostly)  determined  by  the  eigenvalues.  Most  of  the  phenomenology  inherent  in  linear 
dynamics  can  already  be  observed  in  the  two-dimensional  situation,  and  we  devote  Sec¬ 
tion  10.3  to  a  complete  description  of  first  order  planar  linear  systems.  In  Section  10.4,  we 
re-interpret  the  solution  to  a  first  order  system  in  terms  of  the  matrix  exponential,  which 
is  defined  by  analogy  with  the  usual  scalar  exponential  function.  Matrix  exponentials  are 
particularly  effective  for  solving  inhomogeneous  or  forced  linear  systems,  and  also  appear 
in  applications  to  geometry,  computer  graphics  and  animation,  theoretical  physics,  and 
mechanics. 

As  a  consequence  of  Newton’s  laws  of  motion,  mechanical  vibrations  are  modeled  by 
second  order  dynamical  systems.  For  stable  configurations  with  no  frictional  damping,  the 
eigensolutions  constitute  the  system’s  normal  modes,  each  periodically  vibrating  with  its 
associated  natural  frequency.  The  full  dynamics  is  obtained  by  linear  superposition  of  the 
periodic  normal  modes,  but  the  resulting  solution  is,  typically,  no  longer  periodic.  Such 
quasi-periodic  motion  may  seem  quite  chaotic  —  even  though  mathematically  it  is  merely 
a  combination  of  finitely  many  simple  periodic  solutions.  When  subjected  to  an  external 
periodic  forcing,  the  system  usually  remains  in  a  quasi-periodic  motion  that  superimposes 
a  periodic  response  onto  its  own  internal  vibrations.  However,  attempting  to  force  the 
system  at  one  of  its  natural  frequencies,  as  prescribed  by  its  eigenvalues,  may  induce 
a  resonant  vibration,  of  progressively  unbounded  amplitude,  resulting  in  a  catastrophic 
breakdown  of  the  physical  apparatus.  In  contrast,  frictional  effects,  depending  on  first 
order  derivatives/ velocities,  serve  to  damp  out  the  quasi-periodic  vibrations  and  similarly 
help  mitigate  the  dangers  of  resonance. 


10.1  Basic  Solution  Techniques 

Our  initial  focus  will  be  on  systems 


du 

dt 


—  A  u 


(10.1) 
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consisting  of  n  first  order  linear  ordinary  differential  equations  in  the  n  unknowns  u (t)  — 
( . . . ,  un(t) )  G  Mn.  In  an  autonomous  system,  the  time  variable  does  not  appear 
explicitly,  and  so  the  coefficient  matrix  A ,  of  size  n  x  n,  is  a  constant  real^  matrix.  Non- 
autonomous  systems,  in  which  A(t)  is  time-dependent,  are  considerably  more  difficult  to 
analyze,  and  we  refer  the  reader  to  a  more  advanced  text  such  as  [36]. 

As  we  saw  in  Section  8.1,  a  vector- valued  exponential  function 

u (t)  =  ext  v, 

in  which  A  is  a  constant  scalar  and  v  a  constant  vector,  describes  a  solution  to  (10.1)  if 
and  only  if 

Ay  =  Av. 

Hence,  assuming  v^O,  the  scalar  A  must  be  an  eigenvalue  of  A ,  and  v  the  corresponding 
eigenvector.  The  resulting  exponential  function  will  be  called  an  eigensolution  of  the 
linear  system.  Since  the  system  is  linear  and  homogeneous,  linear  superposition  allows  us 
to  combine  the  basic  eigensolutions  to  form  more  general  solutions. 

If  the  coefficient  matrix  A  is  complete  (diagonalizable),  then,  by  definition,  its  eigen¬ 
vectors  v1? . . . ,  vn  form  a  basis.  The  corresponding  eigensolutions 

uiW  =  eAltv1,  ...  un(t)  =  eXntvn, 

will  form  a  basis  for  the  solution  space  to  the  system.  Hence,  the  general  solution  to  a  first 
order  linear  system  with  complete  coefficient  matrix  has  the  form 

uW  =  ciui(*)  +  •••  +  cnun(t)  =  cleXltwl  +  •••  +cneA"tvn,  (10.2) 


where  c1? . . . ,  cn  are  constants,  which  are  uniquely  prescribed  by  the  initial  conditions 

u(£0)  =  u0.  (10.3) 

This  all  follows  from  the  basic  existence  and  uniqueness  theorem  for  ordinary  differential 
equations,  which  will  be  discussed  shortly. 

Example  10.1.  Let  us  solve  the  coupled  pair  of  ordinary  differential  equations 

du  dv 

-  =  6U  +  V.  —  =  U  +  6  V. 

dt  ’  dt 


We  first  write  the  system  in  matrix  form  (10.1)  with  unknown  u (t)  =  f  J  and  coeffi¬ 


cient  matrix  A  — 
A  are 


3  1 


1  3 

Ai  —  4, 


According  to  Example  8.5,  the  eigenvalues  and  eigenvectors  of 


vi  = 


1 

1 


Ao  —  2, 


v2  - 


-1 

1 


Both  eigenvalues  are  simple,  and  so  A  is  a  complete  matrix.  The  resulting  eigensolutions 


Ui(t)  =  e 


4 1 


1 

1 


At 
At  I  ^ 


u2(t)  =  e 


2 1 


1 

1 


>2 1 
At  I  > 


t  Extending  the  solution  techniques  to  complex  systems  with  complex  coefficient  matrices  is 
straightforward,  but  will  not  be  treated  here. 
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form  a  basis  of  the  solution  space,  and  so  the  general  solution  is  a 
u(t)  =  Cle4t  +  c2e2t  ,  hence 


linear  combination 

u(t)  =  c1  e4t  —  c2  e2t, 
v(t)  =  c1  e4t  +  c2  e2t, 


in  which  c1?  c2  are  arbitrary  constants. 


The  Phase  Plane 


As  noted  above,  a  wide  variety  of  physical  systems  are  modeled  by  second  order  ordinary 
differential  equations.  Your  first  course  on  ordinary  differential  equations,  e.g.,  [7,22], 
covered  the  basic  solution  technique  for  constant  coefficient  scalar  equations,  which  we 
quickly  review  in  the  context  of  an  example. 

Example  10.2.  To  solve  the  homogeneous  ordinary  differential  equation 


(10.4) 


we  begin  with  the  exponential  ansatz’i' 


u(t)  =  eAt, 

where  the  constant  factor  A  is  to  be  determined.  Substituting  into  the  differential  equation 
leads  immediately  to  the  characteristic  equation 

A2  +  A  —  6  =  0,  with  roots  A:  =2,  A2  =  —  3. 

Therefore,  e2t  and  e~3t  are  individual  solutions.  Since  the  equation  is  of  second  order, 
Theorem  7.34  implies  that  they  form  a  basis  for  the  two-dimensional  solution  space,  and 
hence  the  general  solution  can  be  written  as  a  linear  combination 

u(t)  =  c1  e2t  +  c2  e_3t,  (10.5) 


where  c1?c2  are  arbitrary  constants. 


There  is  a  standard  trick  to  convert  a  second  order  equation 

cPu  du 

~f~  (X  — 77"  ~\~  p  u  =  0 


dt2 


dt 


into  a  first  order  system.  One  introduces  the  so-called  phase  plane  variables * 


u1  =  u, 


u2  =  u  = 


du 

dt 


Assuming  cp  /?  are  constants,  the  phase  plane  variables  satisfy 


du,  du 

— —  — 

dt  dt 


2  ’ 


dUr ; 

dt 


d2u  du 

—  —  pu  —  a 


dt 2 


dt 


—  —/3u 


au< 


(10.6) 


(10.7) 


^  See  the  footnote  on  p.  379  for  an  explanation  of  the  term  “ansatz” ,  a.k.a.  “inspired  guess”. 
^  We  will  often  use  dots  as  a  shorthand  notation  for  time  derivatives. 
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In  this  manner,  the  second  order  equation  (10.6)  is  converted  into  a  first  order  system 


u  =  A  u, 


where 


-*>=(:$)■ 


A  = 


0 

■13 


(10.8) 


Every  solution  u(t)  to  the  second  order  equation  yields  a  solution  u (t)  =  ( u(t),ii(t))T 
to  the  first  order  system  (10.8),  whose  second  component  is  merely  its  time  derivative. 
Conversely,  if  u (t)  =  is  any  solution  to  (10.8),  then  its  first  component 

u(t)  =  rq(£)  dehnes  a  solution  to  the  original  scalar  equation  (10.6).  We  conclude  that  the 
two  are  completely  equivalent,  in  the  sense  that  solving  one  will  immediately  resolve  the 
other. 

The  variables  (u1,u2)T  =  (u,u)T  serve  as  coordinates  in  the  phase  plane  M2.  The 
solutions  u (t)  parameterize  curves  in  the  phase  plane,  known  as  the  solution  trajectories 
or  orbits.  In  particular,  the  equilibrium  solution  u (t)  =  0  remains  fixed  at  the  origin, 
and  so  its  trajectory  is  a  single  point.  Assuming  /?  ^  0,  every  other  solution  describes 
a  genuine  curve,  whose  tangent  direction  u  =  du/dt  at  a  point  u  is  prescribed  by  the 
right-hand  side  of  the  differential  equation,  namely  u  =  Aw.  The  collection  of  all  possible 
solution  trajectories  is  called  the  phase  portrait  of  the  system.  An  important  fact  is  that,  in 
an  autonomous  first  order  system,  the  phase  plane  trajectories  never  cross.  This  striking 
property,  which  is  also  valid  for  nonlinear  systems,  is  a  consequence  of  the  uniqueness 
properties  of  solutions,  [7,  36].  Thus,  the  phase  portrait  consists  of  a  family  of  non¬ 
intersecting  curves  that,  when  combined  with  the  equilibrium  points,  fill  out  the  entire 
phase  plane.  The  direction  of  motion  along  a  trajectory  will  be  indicated  graphically  by 
a  small  arrow;  nearby  trajectories  are  all  traversed  in  the  same  direction.  The  one  feature 
that  is  not  so  easily  pictured  in  the  phase  portrait  is  the  continuously  varying  speed  at 
which  the  solution  moves  along  its  trajectory.  Plotting  this  requires  a  more  complicated 
three-dimensional  diagram  using  time  as  the  third  coordinate. 

Example  10.2  (continued).  For  the  second  order  equation  (10.4),  the  equivalent  phase 
plane  system  is 


du 

dt 


u. 


or,  in  full  detail, 


Ul  =  u2i 

u2  =  6  u1 


u< 


(10.9) 


Our  previous  solution  formula  (10.5)  implies  that  the  solution  to  the  phase  plane  system 
(10.9)  is  given  by 

u1{t)  =  u(t )  =  c1  e2t  +  c2  e_3t,  tx2(t)  =  —  =  2c1  e2t  —  3 c2  e 


st 


5 
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and  hence 


U  (t)  = 


2  c1e 


2 1 
2 1 


Ci  +  Co  e 


'2 

3  c, 


-3 1 
„-3 1 


,2 1 


=  C 


2e 


2 1 


T  Cr 


3e 


3t 

-3t 


(10.10) 


A  sketch  of  the  phase  portrait,  indicating  several  representative  trajectories,  appears  in 
Figure  10.1.  The  solutions  with  c2  =  0  go  out  to  oc  along  the  two  rays  in  the  directions 

(1,2)  and  ( —  1,  —  2 )  ,  whereas  those  with  cx  =  0  come  in  to  the  origin  along  the  rays 
in  the  directions  ( 1,  —  3 )  and  (—1,3)  .  All  other  non-equilibrium  solutions  move  along 
hyperbolic  trajectories  whose  asymptotes,  in  forward  and  backward  time,  are  one  of  these 
four  rays. 


With  some  practice,  one  learns  to  understand  the  temporal  behavior  of  the  solution  by 
studying  its  phase  plane  trajectory.  We  will  investigate  the  qualitative  and  quantitative 
behavior  of  phase  plane  systems  in  depth  in  Section  10.3. 


Exercises 


10.1.1.  Choose  one  or  more  of  the  following  differential  equations,  and  then:  (a)  Solve  the 
equation  directly,  (b)  Write  down  its  phase  plane  equivalent,  and  the  general  solution 
to  the  phase  plane  system,  (c)  Plot  at  least  four  representative  trajectories  to  illustrate 
the  phase  portrait,  (d)  Choose  two  trajectories  in  your  phase  portrait  and  graph  the 
corresponding  solution  curves  u(t).  Explain  in  your  own  words  how  the  orbit  and  the 

solution  graph  are  related,  (i)  a  +  4 a  =  0,  (ii)  a  —  4 a  =  0,  (Hi)  a  +  2a  +  a  =  0, 

(iv)  a  +  4a  +  3a  =  0,  (v)  a  —  2a  +  10a  =  0. 


d3a 


d2u 


du 


10.1.2.  (a)  Convert  the  third  order  equation  — +  3  —w-  +  4  — — b  12 u  =  0  into  a  first  order 

dt 6  dtz  dt 

system  in  three  variables  by  setting  u1  =  a,  a2  =  u,  u3  =  u.  (b)  Solve  the  equation 
directly,  and  then  use  this  to  write  down  the  general  solution  to  your  first  order  system, 
(c)  What  is  the  dimension  of  the  solution  space? 


10.1.3.  Convert  the  second  order  coupled  system  of  ordinary  differential  equations 

u  =  au  +  bv  +  cu  +  dv,  v  =  pu  +  qv  +  ru  +  sv, 
into  a  first  order  system  involving  four  variables. 

0  10.1.4.  (a)  Show  that  if  u (t)  solves  u  =  Au,  then  its  time  reversal ,  defined  as  v(£)  =  u(—  £), 
solves  v  =  Bv,  where  B  =  —  A.  (b)  Explain  why  the  two  systems  have  the  same  phase 
portraits,  but  the  direction  of  motion  along  the  trajectories  is  reversed,  (c)  Apply  time 
reversal  to  the  system(s)  you  derived  in  Exercise  10.1.1.  (d)  What  is  the  effect  of  time 
reversal  on  the  original  second  order  equation? 


C  10.1.5.  A  first  order  linear  system  u  =  aa  +  6u,  v  =  cu  +  du,  can  be  converted  into  a  single 
second  order  differential  equation  by  the  following  device.  Assuming  that  5/0,  solve  the 
system  for  v  and  v  in  terms  of  u  and  u.  Then  differentiate  your  equation  for  v  with  respect 
to  £,  and  eliminate  v  from  the  resulting  pair  of  equations.  The  result  is  a  second  order 
ordinary  differential  equation  for  u(t).  (a)  Write  out  the  second  order  equation  in  terms 
of  the  coefficients  a,  5,  c,  d  of  the  first  order  system,  (b)  Show  that  there  is  a  one-to-one 
correspondence  between  solutions  of  the  system  and  solutions  of  the  scalar  differential 
equation,  (c)  Use  this  method  to  solve  the  following  linear  systems,  and  sketch  the 
resulting  phase  portraits,  (i)  u  =  v,  v  =  —u,  (ii)  u  =  2u  -j-  5v,  v  =  —u,  (Hi)  u  =  4u  —  v, 
v  =  6a  —  3u,  (iv)  a  =  a  +  u,  v  =  a  —  a,  (a)  a  =  a,  v  =  0.  (d)  Show  how  to  obtain 
a  second  order  equation  satisfied  by  v(t)  by  an  analogous  device.  Are  the  second  order 

equations  for  a  and  for  v  the  same?  (e)  Discuss  how  you  might  proceed  if  b  =  0. 
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10.1.6.  (a)  Show  that  if  u (t)  solves  u  =  lu,  then  v(£)  =  u(2 1)  solves  v  =  Sv,  where  B  =  2 A  . 
(b)  How  are  the  solution  trajectories  of  the  two  systems  related? 

du 

10.1.7.  Let  A  be  a  constant  n  x  n  matrix.  Let  u (t)  be  a  solution  to  the  system  —  =  Aw. 


(jf  u| 

(a)  Show  that  its  derivatives  — -r- 

,k  dtk 


dt 


for  k  =  1,2,...,  are  also  solutions. 


(b)  Show  that 


dr  u 


dtk 


=  A  u. 


10.1.8.  True  or  false:  Each  solution  to  a  phase  plane  system  moves  at  a  constant  speed  along 
its  trajectory. 


10.1.9.  True  or  false:  The  phase  plane  trajectories  (10.10)  for  (c1,c2  )T  ^  0  are  hyperbolas. 

10.1.10.  Use  a  three-dimensional  graphics  package  to  plot  solution  curves  ( £,  u1  (£),  u2(t)  )T  of 
the  phase  plane  systems  in  Exercise  10.1.1.  Discuss  their  shape  and  explain  how  they  are 
related  to  the  phase  plane  trajectories. 


Existence  and  Uniqueness 

Before  delving  further  into  our  subject,  it  will  help  to  briefly  summarize  the  basic  existence 
and  uniqueness  theorems  as  they  apply  to  linear  systems  of  ordinary  differential  equations. 
Even  though  we  will  study  only  the  constant  coefficient  case  in  detail,  these  results  are 
equally  applicable  to  non-autonomous  systems,  and  so  —  but  only  in  this  subsection 
we  allow  the  coefficient  matrix  to  depend  continuously  on  t.  A  key  fact  is  that  a  system 
of  n  first  order  ordinary  differential  equations  requires  n  initial  conditions  —  one  for  each 
variable  —  in  order  to  uniquely  specify  its  solution.  More  precisely: 

Theorem  10.3.  Let  A(t)  be  an  n  x  n  matrix  and  f  (t)  an  n-component  column  vector  each 
of  whose  entries  is  a  continuous  functions  on  the  interval^  a  <  t  <  b.  Set  an  initial  time 
a<t0<  b  and  an  initial  vector  b  E  Mn.  Then  the  initial  value  problem 

d  ii 

—  =  A(t)u  +  f(t),  u(t0)  =  b,  (10.11) 

admits  a  unique  solution  u (t)  that  is  defined  for  all  a  <  t  <  b. 

For  completeness,  we  have  included  an  inhomogeneous  forcing  term  f  (t)  in  the  system. 
We  will  not  prove  Theorem  10.3,  which  is  a  direct  consequences  of  the  more  general  exis¬ 
tence  and  uniqueness  theorem  for  nonlinear  systems  of  ordinary  differential  equations.  Full 
details  can  be  found  in  most  texts  on  ordinary  differential  equations,  including  [7,  22,  36]. 
In  the  homogeneous  case,  when  f  (t)  =  0,  uniqueness  of  solutions  implies  that  the  solution 
with  zero  initial  conditions,  u(£0)  =  0,  is  the  trivial  zero  solution:  u (?)  =  0  for  all  t.  In 
other  words,  if  you  start  at  an  equilibrium,  you  remain  there  for  all  time.  Moreover,  you 
can  never  arrive  at  equilibrium  in  a  finite  amount  of  time,  since  if  u (t-J  =  0,  then,  again 
by  uniqueness,  u(t)  =  0  for  all  t  <  tx  (and  >  t1,  too). 

Uniqueness  has  another  important  consequence:  linear  independence  of  solutions  needs 
be  checked  only  at  a  single  point. 


We  allow  a  and  b  to  be  infinite. 
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Lemma  10.4.  The  solutions  u1(t), . . . ,  u k(t)  to  a  first  order  homogeneous  linear  system 
u  =  A(t)  u  are  linearly  independent  if  and  only  if  their  initial  values  u1(t0), . . . ,  u k(t0)  are 
linearly  independent  vectors  in  Mn. 


Proof :  If  the  solutions  were  linearly  dependent,  one  could  find  (constant)  scalars  c1? . . . ,  cfc, 
not  all  zero,  such  that 


u(i)  =  CiUi(t)  +  •••  +  ckuk(t)=0.  (10.12) 

The  equation  holds,  in  particular,  at  t  =  £0, 

u(*o)  =  ciui(t0)  +  •••  +cfeufe(t0)  =  0.  (10.13) 


This  immediately  proves  linear  dependence  of  the  initial  vectors. 

Conversely,  if  the  initial  values  u1(t0), . . . ,  ufc(£0)  are  linearly  dependent,  then  (10.13) 
holds  for  some  c1,...,cfc,  not  all  zero.  Linear  superposition  implies  that  the  self-same 
linear  combination  u (t)  =  c1u1(t)  +  •  •  •  +  ckuk(t)  is  a  solution  to  the  system,  with  zero 
initial  condition.  By  uniqueness,  u (t)  =  0  for  all  t,  and  so  (10.12)  holds,  proving  linear 
dependence  of  the  solutions.  Q.E.D. 


Warning.  This  result  is  not  true  if  the  functions  are  not  solutions  to  a  first  order 


linear  system.  For  example,  u fit)  = 


1 

t 


u2(t>  = 


cos  t 
sin  t 


vector- valued  functions,  but,  at  time  t  —  0,  the  vectors  u:(0)  = 


,  are  linearly  independent 
1 


0 


=  u2(0)  are  linearly 


dependent.  Even  worse,  u  1(t)  = 


1 

t 


,  u2(t)  = 


t 

e 


,  define  linearly  dependent  vectors 


at  every  specified  value  of  t.  Nevertheless,  as  vector-valued  functions,  they  are  linearly 
independent.  (Why?)  In  view  of  Lemma  10.4,  neither  pair  of  vector- valued  functions  can 
be  solutions  to  a  common  first  order  homogeneous  linear  system. 


The  next  result  tells  us  how  many  different  solutions  are  required  in  order  to  construct 
the  general  solution  by  linear  superposition. 


Theorem  10.5.  Let  u1(t), . . . ,  u n(t)  be  n  linearly  independent  solutions  to  the  homoge¬ 
neous  system  of  n  first  order  linear  ordinary  differential  equations  u  =  A(t)  u.  Then  the 
general  solution  is  a  linear  combination  u (t)  =  c1u1(t)  +  •  •  •  +  cnun(t)  depending  on  n 
arbitrary  constants  c1? . . . ,  cn. 

Proof :  If  we  have  n  linearly  independent  solutions  u  1(t), . . . ,  u n(t),  then  Lemma  10.4  im¬ 
plies  that,  at  the  initial  time  t0,  the  vectors  u1(t0), . . . ,  un(t0)  are  linearly  independent,  and 
hence  form  a  basis  for  Mn.  This  means  that  we  can  express  an  arbitrary  initial  condition 

u(i0)  =  b  =  ClUi(t0)  +  •••  +CnUn(to) 

as  a  linear  combination  of  the  initial  vectors.  Superposition  and  uniqueness  of  solutions 
implies  that  the  corresponding  solution  to  the  initial  value  problem  (10.11)  is  given  by  the 
same  linear  combination 


u(t)  =  CyU^t)  +  ■■■  +CnUn(t). 

We  conclude  that  every  solution  to  the  ordinary  differential  equation  can  be  written  in  the 
prescribed  form,  which  thus  forms  the  general  solution.  Q.E.D. 
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Complete  Systems 

Thus,  given  a  system  of  n  homogeneous  linear  differential  equations  u  =  4u,  the  immediate 
goal  is  to  find  n  linearly  independent  solutions.  Each  eigenvalue  A  and  eigenvector  v  of 
its  (constant)  coefficient  matrix  A  leads  to  an  exponential  eigensolution  u (t)  =  eAtv.  The 
eigensolutions  will  be  linearly  independent  if  and  only  if  the  eigenvectors  are  —  this  follows 
directly  from  Lemma  10.4.  Thus,  if  the  n  x  n  matrix  admits  an  eigenvector  basis,  i.e.,  it 
is  complete,  then  we  have  the  requisite  number  of  solutions,  and  hence  have  solved  the 
differential  equation. 

Theorem  10.6.  If  the  n  x  n  matrix  A  is  complete,  then  the  general  (complex)  solution 
to  the  autonomous  linear  system  u  =  Aw  is  given  by 

u(i)  =  c1eAltv1  +  •••  +cneA^v„,  (10.14) 

where  v1? . . . ,  vn  are  the  eigenvector  basis,  Al7 . . . ,  \n  the  corresponding  eigenvalues.  The 
constants  c1? . . . ,  cn  are  uniquely  specified  by  the  initial  conditions  u (tQ)  —  b. 

Proof :  Since  the  eigenvectors  are  linearly  independent,  the  eigensolutions  define  linearly 
independent  vectors  u1(0)  =  v1,...,un(0)  =  vn  at  the  initial  time  t  —  0.  Lemma  10.4 
implies  that  the  eigensolutions  ux(t), . . . ,  un(t)  are,  indeed,  linearly  independent.  Hence, 
we  know  n  linearly  independent  solutions,  and  the  result  is  an  immediate  consequence  of 
Theorem  10.5.  Q.E.D. 


Example  10.7.  Let  us  solve  the  initial  value  problem 


u  i  — 


u  o  = 


The  coefficient  matrix  is  A  = 
eigenvalues  and  eigenvectors: 


-2 


+  u2i 

Wi(0)  =  3, 

3  u2, 

S2 

to 

(o 

O 

A  straightforward 

Ai  —  4, 


v,  = 


1 

2 


A2  — 


1 


V2  = 


1 

1 


Theorem  10.6  assures  us  that  the  corresponding  eigensolutions 

1 


ui  (t)  =  e 


—  At 


—  2 


u2(i)  =  e  *  (  j)  , 


form  a  basis  for  the  two-dimensional  solution  space.  The  general  solution  is  an  arbitrary 
linear  combination 


f  cxe  4t  +  c2e  1  \ 

\—2c1e~At  +  c2e~t  J  ’ 


where  c1?  c2  are  constant  scalars.  Once  we  have  the  general  solution  in  hand,  the  final  step 
is  to  determine  the  values  of  c1?  c2  in  order  to  satisfy  the  initial  conditions.  Evaluating  the 
solution  at  t  =  0,  we  find  that  we  need  to  solve  the  linear  system 


ux  (0)  =  cx  +  c2  =  3,  u2 (0)  =  —  2  cx  T  c2  =  0, 

for  c1  =  1,  c2  =  2.  Thus,  the  (unique)  solution  to  the  initial  value  problem  is 

u1(t)  —  e~4t  +  2  e-t,  n2(t)  —  —  2e_4t  +  2e_t.  (10.15) 


Note  that  both  components  of  the  solution  decay  exponentially  fast  to  0  as  t  oc. 
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Example  10.8.  Consider  the  linear  initial  value  problem 


zt i  T  2u2l 

u2  =  u2  —  2^3, 
u3  =  2ux  +  2u2  —  u3 


M°)  =  2, 

M°)  =  -r 
%(0)  =  — 2. 


1  2  0 

The  coefficient  matrix  is  A  =  I  0  1  —2  I .  In  Example  8.9,  we  computed  its  eigenvalues 

and  eigenvectors:  \  2  2  —1 


Ai  =  -l, 


vi  = 


1  '  ’ 


Ao  = 


Vo  = 


1  +  2  i , 

1 
i 
1 


Ao  =  1  — 


2i, 
1 


Vo  = 


—  i 
1 


The  corresponding  eigensolntions  to  the  system  are 


u  1(t)  =  e 


— t 


1 

1 

1 


u2(t)  =  e(1+2i)M  i 


1 


1 


u. 


(t)  =  e(1“2i)t  -i 


1 


The  first  solution  is  real,  but  the  second  and  third,  while  perfectly  valid  solutions,  are 
complex-valued,  and  hence  not  as  convenient  to  work  with  if,  as  in  most  applications,  we 
are  ultimately  after  real  functions.  But,  since  the  underlying  linear  system  is  real,  the 
general  reality  principle  of  Theorem  7.48  tells  us  that  a  complex  solution  can  be  broken  up 
into  its  real  and  imaginary  parts,  each  of  which  is  a  real  solution.  Here,  applying  Euler’s 
formula  (3.92)  to  the  complex  exponential,  we  obtain 


/  l\  /  l\  /  etcos2 1\  /  etsin2t\ 

u9 (t)  —  e^1+2^ 1  I  i  )  =  ( ef  cos 2 1  +  i et  sin 2t)  (  i  )  =  (  —  et  sin 2 1  \  +  i  [  ef  cos 2 1  \  . 

\l)  \l)  \  cos  2t  J  \etsin2 1) 

The  final  two  vector-valued  functions  are  independent  real  solutions,  as  you  can  readily 
check.  In  this  manner,  we  have  produced  three  linearly  independent  real  solutions 

/  — e_t\  /  etcos2t\  /  etsin2t\ 

Ui(t)  =  I  e~l  J  ,  \i2(t'j  =  I  —  etsin2t  J  ,  u3(t)  =  |  ef  cos2t  J  , 

y  e~l  J  y  etcos2£y  yetsin2ty 

which,  by  Theorem  10.5,  form  a  basis  for  the  three-dimensional  solution  space  to  our 
system.  The  general  solution  can  be  written  as  a  linear  combination: 

(  —  cx  +  c2  el  cos  2 1  +  c3  ef  sin  2 1 
cx  e~l  —  c2  ef  sin  2 1  +  c3  el  cos  2 1 
cx  +  c2  el  cos  2 1  +  c3  ef  sin  2 1 


The  constants  c1?  c2,  c3  are  uniquely  prescribed  by  imposing  initial  conditions.  In  our  case, 
the  solution  satisfying 


+  C2\ 
+  C3 
+  C2/ 


Ci  —  2, 

results  in  c2  =  0, 

c3  =  1. 
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Thus,  the  solution  to  the  original  initial  value  problem  is 

f  f  2e~t  +  etsm2 t\ 

u(t)  —  |  u2(t)  )  =  [  —  2e-t  +  et  cos2t  J  . 

\u3(t)  )  y  —  2e_t  +  et  sin2 1) 

Incidentally,  the  third  complex  eigensolution  also  produces  two  real  solutions,  but  these 
reproduce  the  ones  we  have  already  listed,  since  it  is  the  complex  conjugate  of  the  second 
eigensolution,  and  so  u3(t)  =  u2(t)  —  iu3(t).  In  general,  when  solving  real  systems,  you 
need  to  deal  with  only  one  eigenvalue  from  each  complex  conjugate  pair  to  construct  a 
complete  system  of  real  solutions. 


Exercises 


du 


dv 


10.1.11.  Find  the  solution  to  the  system  of  differential  equations  —  =  3it  +  4t;,  —  =  Au  —  3v , 


dt 


dt 


with  initial  conditions  a(0)  =  3  and  v (0)  =  —2. 

10.1.12.  Find  the  general  real  solution  to  the  following  systems  of  differential  equations: 


(a) 


U1  =  U1  +  9u2, 
u2  =  u±  +  3a2; 


(b) 


x1  =  Ax1  +  3x2 


x2  =  3x±  —  Ax 


(c) 


25 


Vl  =Vl  ~V2i 

V2  =  T  3 V 2': 


Vl  =  V2’ 

(d)  y2  =  3 y1  +  2 y3. 
i/3 =  ~y^ 


x1  =  3x1  —  8x2  +  2x3, 


u 


1  =  iq  —  3^i2  +  11^, 


(e)  x2  =  —  x1  +  2x2  +  2x3,  (f)  u2  =  2 u1  —  6u2  +  16i^3, 


x 


3  =  x1  -  4x2  +  2x3; 


10.1.13.  Solve  the  following  initial  value  problems:  (a) 


du 


( b ) 


(d) 


(0 


du 

dt 

du 

dt 

du 

dt 


1  -2 
-2  1 

/  -i  3 
2  2-7 

V  0  3  -47 

/0  0  1  0\ 
0  0  0  2 
10  0  0 
\0  2  0  0/ 


u,  u(0)  = 

3  \ 

u,  u(0)  = 


(c) 


du 


dt 


(\ 


(e) 


du 

dt 


u,  u(2)  = 


(g) 


du 

dt 


dt 

1  2 
-1  1 

/  2 
-1 
V  o 

/  2  1 
-3  -2 


u 


0  2 
2  0 


3  =  —  3^2  +  7i^3. 


)U,  u(l)=(j); 


u,  u(0) 


0 


1 

0 

1 


u,  u(tt ) 


V 


0 

0 


0 

0 


1 

0 

1 

1 


o\ 
1 
2 
1  J 


u. 


u(o)  = 


10.1.14.  (a)  Find  the  solution  to  the  system 


dx 

dt 


x  +  y, 


dy_ 

dt 


x  —  y,  that  has  initial 


conditions  x(0)  =  1,  y( 0)  =  0.  (b)  Sketch  a  phase  portrait  of  the  system  that  shows  several 
typical  solution  trajectories,  including  the  solution  you  found  in  part  (a).  Clearly  indicate 
the  direction  of  increasing  t  on  your  curves. 

T 

10.1.15.  A  planar  steady-state  fluid  flow  has  velocity  vector  field  v  =  (2x  —  3 y:x  —  y) 


at  position  x  =  (x,y)T .  The  corresponding  fluid  motion  is  described  by  the  differential 
equation  ^  =  v.  A  floating  object  starts  out  at  the  point  (1,1  )T .  Find  its  position 
after  one  time  unit. 


T 

10.1.16.  A  steady-state  fluid  flow  has  velocity  vector  field  v  =  (  —  2y,2x,z)  at  position 


x  =  (x,y:z)T .  Describe  the  motion  of  the  fluid  particles  as  governed  by  the  differential 
dx 


equation 


dt 


=  v. 
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du 

dt 


-6 

1 


1 

6 


10.1.17.  Solve  the  initial  value  problem 
orthogonality  can  help. 

10.1.18.  (a)  Find  the  eigenvalues  and  eigenvectors  of  K  = 


)  u,  u(o)  =  (  2).  Explain  how 


( 


\ 


1  -1  0\ 

-1  2  -1  . 

0-1  l) 

(b)  Verify  that  the  eigenvectors  are  mutually  orthogonal,  (c)  Based  on  part  (a),  is  K 
positive  definite,  positive  semi-definite,  or  indefinite?  (d)  Solve  the  initial  value  problem 


du 

dt 


=  K  u,  u(0)  = 


using  orthogonality  to  simplify  the  computations. 


10.1.19.  Demonstrate  that  one  can  also  solve  the  initial  value  problem  in  Example  10.8  by 
writing  the  solution  as  a  complex  linear  combination  of  the  complex  eigensolutions,  and 
then  using  the  initial  conditions  to  specify  the  coefficients. 

10.1.20.  Determine  whether  the  following  vector- valued  functions  are  linearly  dependent  or 
linearly  independent: 

l2 


(a) 


(e) 


( h ) 


1 

t 


-t 

1 


(0 


1  +  t 
t 


2  £  o  1  \  /  2 1  •  0  4- 

e  cos  St  \  e  sin  St 


—  e2t  sin  3t 


e2t  cos  3 1 


1  -  V 
t  -  t 


(0 


> (c) 

cos  3 1 
sin  3 1 


1 

t 


t 
2 

sin  3 1 
cos  3 1 


( 


A 


— e 


V  eV 


/  p}\ 


V-eV 


/ 


V  eV 


(i) 


e*  \ 
tet 

\t2elJ 


(t2e*\  /ie 


V  ie*  / 


e' 


0  10.1.21.  Let  A  be  a  constant  matrix.  Suppose  u (t)  solves  the  initial  value  problem  u  =  iu, 

u(0)  =  b.  Prove  that  the  solution  to  the  initial  value  problem  u  =  iu,  u(t0)  =  is  equal 
to  u (t)  =  u (t  —  t0).  How  are  the  solution  trajectories  related? 


10.1.22.  Suppose  u (t)  and  u (t)  both  solve  the  linear  system  u  =  Au.  (a)  Suppose  they  have 
the  same  value  u(t1)  =  u(t1)  at  any  one  time  t1.  Show  that  they  are,  in  fact,  the  same 
solution:  u (t)  =  u (t)  for  all  t.  (b)  What  happens  if  u(t1)  =  u (t2)  for  some  t1  7^  t2? 

Hint :  See  Exercise  10.1.21. 


10.1.23.  Prove  that  the  general  solution  to  a  linear  system  u  =  Au  with  diagonal  coefficient 
matrix  A  =  diag  , . . . ,  An)  is  given  by  u(t)  =  ( c 1  e^1 . . . ,  cn  e^n  t)T . 

10.1.24.  Show  that  if  u (t)  is  a  solution  to  u  =  Au,  and  S'  is  a  constant,  nonsingular  matrix  of 
the  same  size  as  A,  then  v(t)  =  Su(t)  solves  the  linear  system  v  =  Bv,  where  B  =  SAS-1 
is  similar  to  A. 

0  10.1.25.  (y)  Combine  Exercises  10.1.23-24  to  show  that  if  A  =  SAS~1  is  diagonalizable,  then 
the  solution  to  u  =  Au  can  be  written  as  u(t)  =  S  ( c 1  e  1  , . . . ,  cn  e  71  )  ,  where  A1? . . . ,  Xn 
are  its  eigenvalues  and  S  =  ( v1  v2  . . .  vn  )  is  the  corresponding  matrix  of  eigenvectors. 

(ii)  Write  the  general  solution  to  the  systems  in  Exercise  10.1.13  in  this  form. 


The  General  Case 

Summarizing  the  preceding  subsection,  if  the  coefficient  matrix  of  a  homogeneous,  au¬ 
tonomous  first  order  linear  system  is  complete,  then  the  eigensolutions  form  a  (complex) 
basis  for  the  solution  space.  Assuming  the  coefficient  matrix  is  real,  one  obtains  a  real 
basis  by  taking  the  real  and  imaginary  parts  of  each  complex  conjugate  pair  of  solutions. 
In  the  incomplete  cases,  the  formulas  for  the  basis  solutions  are  a  little  more  intricate,  and 
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involve  polynomials  as  well  as  (complex)  exponentials.  Readers  who  did  not  cover  Sec¬ 
tion  8.6  are  advised  to  skip  ahead  to  Section  10.2;  only  Theorem  10.13,  which  summarizes 
the  key  features,  will  be  used  in  the  sequel. 

Example  10.9.  The  simplest  incomplete  case  arises  as  the  phase  plane  equivalent  of  a 

scalar  ordinary  differential  equation  whose  characteristic  equation  has  a  repeated  root.  For 
example,  to  directly  solve  the  second  order  equation 


(10.16) 


we  substitute  the  usual  exponential  ansatz  u  =  eAt, 


leading  to  the  characteristicequation 


A2  -  2A  +  1  =  0. 


There  is  only  one  double  root,  A  =  1,  and  hence,  up  to  scalar  multiple,  only  one  ex¬ 
ponential  solution  u1(t)  —  el .  For  a  scalar  ordinary  differential  equation,  the  second, 
“missing” , solution  is  obtained  by  simply  multiplying  the  first  by  £,  so  that  u2(t)  —tet.  As 
a  result,  the  general  solution  to  (10.16)  is 

u(t)  =  cx  u1  (t)  +  c2  u2  (t)  =  cx  e1  +  c2 1  et . 


As  in  (10.8),  the  equivalent  phase  plane  system  is 


du 

dt 


where 


Note  that  the  coefficient  matrix  is  incomplete  —  it  has  A  =  1  as  a  double  eigenvalue, 
but  only  one  independent  eigenvector,  namely  v  =  (1,1).  The  two  linearly  independent 
solutions  to  the  phase  plane  system  can  be  constructed  from  the  two  solutions  to  the  scalar 
equation.  Thus, 

Ul(*)=(e*)’  u2  W=(j  +  et) 

form  a  basis  for  the  two-dimensional  solution  space.  The  first  is  an  eigensolution,  while  the 
second  includes  an  additional  polynomial  factor.  Observe  that,  in  contrast  to  the  scalar 
case,  the  second  solution  u2  is  not  obtained  from  the  first  by  merely  multiplying  by  t. 


In  general,  the  eigenvectors  of  an  incomplete  matrix  fail  to  form  a  basis,  and,  as  noted 
in  Section  8.6,  can  be  extended  to  a  Jordan  basis.  Thus,  the  key  step  is  to  describe  the 
solutions  associated  with  a  Jordan  chain,  cf.  Definition  8.47. 


Lemma  10.10.  Suppose  w1? . . . ,  wk  form  a  Jordan  chain  of  length  k  for  the  eigenvalue 
A  of  the  matrix  A.  Then  there  are  k  linearly  independent  solutions  to  the  corresponding 
first  order  system  u  =  Au  having  the  form 


ui(£)  —  eAtw1,  u2(t) 
and,  in  general, 


eA‘(iWj  +w2),  u 3(t)  =  eAt(  p2  w1  +t w2  +  w3), 

y  +j~i 

(10.17) 


uiW  =  eAt  XJ  f~ )7 wi’  1  <i<k- 


The  proof  is  by  direct  substitution  of  the  formulas  (10.17)  into  the  differential  equation, 
invoking  the  defining  relations  (8.46)  of  the  Jordan  chain  as  needed;  details  are  left  to  the 
reader.  If  A  is  a  complex  eigenvalue,  then  the  Jordan  chain  solutions  (10.17)  will  involve 
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complex  exponentials.  As  usual,  if  A  is  a  real  matrix,  they  can  be  split  into  their  real  and 
imaginary  parts,  which  are  independent  real  solutions. 

Example  10.11.  The  coefficient  matrix  of  the  system 


du 

dt 


0 

2 

0 

-1 

0 


0 

1 

0 

1 

-1 


u 


is  incomplete;  it  has  only  2  linearly  independent  eigenvectors  associated  with  the  eigenval¬ 
ues  1  and  —2.  Using  the  Jordan  basis  computed  in  Example  8.52,  we  produce  the  following 
5  linearly  independent  solutions: 


Ui(^)=etVi,  u2(t)  =  et(tv1  +  v2),  u3(t)  =  e*( \t2  vx  +  t\2  +  v3 ), 


u4(t)  =  e  24 v 


us(t>=e  2t(<v4  +  v5), 


45 


or,  explicitly, 


°\ 

0 

0 

e4 

e4  / 


0 


\ 


V( 


e 

0 

—  tel 
1  At)  et  / 


\ 


0 

i  e4 

0 

(i-  p2)e 

V(-t+p2)ety 


/  “e“2t  \ 
e~2t  ' 

e~2t 

—  2e~2t 

Vo/ 


2 1 


-  (1  +  i)  6 
te~2t 
te~2t 

2  (1  +  t)e~2t 


\ 


-2 1 


The  first  three  solutions  are  associated  with  the  =  1  Jordan  chain,  the  last  two  with 
the  A2  —  —  2  chain.  The  eigensolutions  are  the  pure  exponentials  u1(t),  u4(t).  The  general 
solution  to  the  system  is  an  arbitrary  linear  combination  of  these  five  basis  solutions. 


Proposition  10.12.  Let  A  be  an  n  x  n  matrix.  Then  the  Jordan  chain  solutions  (10.17) 
constructed  from  a  Jordan  basis  of  A  form  a  basis  for  the  n-dimensional  solution  space  for 
the  corresponding  linear  system  u  =  Au. 

The  proof  of  linear  independence  of  the  Jordan  chain  solutions  follows,  via  Lemma  10.4, 
from  the  linear  independence  of  the  Jordan  basis  vectors,  which  are  their  initial  values. 

Important  qualitative  features  can  be  readily  gleaned  from  the  algebraic  structure  of  the 
solution  formulas  (10.17).  The  following  result  describes  the  principal  classes  of  solutions 
of  homogeneous  autonomous  linear  systems  of  ordinary  differential  equations. 

Theorem  10.13.  Let  A  be  a  real  n  x  n  matrix.  Every  real  solution  to  the  linear  system 
u  =  Au  is  a  linear  combination  of  n  linearly  independent  solutions  appearing  in  the 
following  four  classes: 

(1)  If  A  is  a  complete  real  eigenvalue  of  multiplicity  m,  then  there  exist  m  linearly  inde¬ 
pendent  solutions  of  the  form 

u  k(t)  =  ext\k-  k  =  l,...,m, 

where  v1? . . . ,  vm  are  linearly  independent  eigenvectors. 
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(2)  If  A±  =  /i=b  i  v  form  a  pair  of  complete  complex  conjugate  eigenvalues  of  multiplicity 
m,  then  there  exist  2m  linearly  independent  real  solutions  of  the  forms 


ufeW  =  e^[cos (vt)  xfc  -sin(i/i)  yfe 
Qfe(i)  =  e'ii[sin(^)  xfe  +  cos(W)  yfe 


fc  =  1, . . . ,  ra, 


where  vk  =  =b  iyfc  are  the  associated  complex  conjugate  eigenvectors. 

(3)  If  A  is  an  incomplete  real  eigenvalue  of  multiplicity  m  and  r  is  the  dimension  of  the 
eigenspace  V^,  then  there  exist  m  linearly  independent  solutions  of  the  form 


U/V)  =  eAtP  fc(*), 


k  —  1, . . . ,  ra, 


where  pfc(£)  is  a  vector  of  polynomials  of  degree  <  m  —  r. 

(4)  If  A±  =  /x=b  i  v  form  a  pair  of  incomplete  complex  conjugate  eigenvalues  of  multiplicity 
m,  and  r  is  the  common  dimension  of  the  two  eigenspaces,  then  there  exist  2  m 
linearly  independent  real  solutions 


UfeC) 

Ufe(i) 


=  e 
=  e 


[cos (vt)  p k(t)  -  sin (vt)  qfe(t)], 
^  [sin(^)  pk(t) +cos(vt)  qfe(t)], 


k  =  1, . . . ,  m, 


where  pfc(£),qfc(£)  are  vectors  of  polynomials  of  degree  <  m  —  r,  whose  detailed 
structure  can  be  gleaned  from  Lemma  10.10. 


As  a  result,  every  real  solution  to  a  homogeneous  linear  system  of  ordinary  differential 
equations  is  a  vector- valued  function  whose  entries  are  linear  combinations  of  functions  of 
the  particular  form  tk  eMt  cos  vt  and  £fceMtsinz/t,  i.e.,  sums  of  products  of  exponentials, 
trigonometric  functions,  and  polynomials.  The  exponents  / 1  are  the  real  parts  of  the  eigen¬ 
values  of  the  coefficient  matrix;  the  trigonometric  frequencies  v  are  the  imaginary  parts  of 
the  eigenvalues;  nonconstant  polynomials  appear  only  if  the  matrix  is  incomplete. 


Exercises 

du. 

10.1.26.  Find  the  general  solution  to  the  linear  system  — —  =  A u  for  the  following  incomplete 


dt 


coefficient  matrices:  (a) 


(■ d ) 


(e) 


2  1 
0  2 

/  —3 
1 

V  o 


>  (b) 


2  -1 
9  -4 


(c) 


-1  -1 

4  -5 


1  ^ 

3  -1 

1  -3 ) 


(0 


(  3 

1 

1 

1\ 

/ 

0 

1 

1 

o\ 

0 

-1 

0 

1 

,  (g) 

-l 

0 

0 

1 

0 

0 

3 

1 

0 

0 

0 

1 

\0 

0 

0 

-ij 

V 

0 

0 

-1 

0  ) 

10.1.27.  Find  a  first  order  system  of  ordinary  differential  equations  that  has  the  indicated 


vector-valued  function  as  a  solution:  (a) 


t  1  (  e  ^cos3 1 


e  +  e 
2et 


.  (b) 


—  3e  *  sin  3 1 


,  (c) 


1 

t  -  1 


(d) 


sin  2 1  —  cos  2 1 
sin  2 1  +  3  cos  2 1 


(  e2t  \ 

(  sin  t  ^ 

(  t  \ 

(  sin  t  ^ 

.  (e)  9  e-3t  „ 

,  (f)  cos t 

,  (g) 

1  -t2 

,  (b) 

2  el  cos  t 

\  e2t  —  e  3t  ) 

\  1  J 

K 1  + t  j 

K  et  sin  t  ) 

10.1.28.  Which  sets  of  functions  in  Exercise  10.1.20  can  be  solutions  to  a  common  first  order, 
homogeneous,  constant  coefficient  linear  system  of  ordinary  differential  equations?  If  so, 
find  a  system  they  satisfy;  if  not,  explain  why  not. 
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d3u 

10.1.29.  Solve  the  third  order  equation  — s-  +  3 

at * 


d2u 


du 


.  9  +  4  — — b  12  u  =  0  by  converting  it  into  a 
dtz  dt 


first  order  system.  Compare  your  answer  with  what  you  found  in  Exercise  10.1.2. 


10.1.30.  Solve  the  second  order  coupled  system  of  ordinary  differential  equations  u  =  u  +  u  —  v, 
v  =  v  —  u  v,  by  converting  it  into  a  first  order  system  involving  four  variables. 

10.1.31.  Suppose  that  u (t)  E  Mn  is  a  polynomial  solution  to  the  constant  coefficient  linear 
system  u  =  Au.  What  is  the  maximal  possible  degree  of  u(£)?  What  can  you  say  about  A 
when  u  (t)  has  maximal  degree? 

0  10.1.32.  (a)  Under  the  assumption  that  u1? . . . ,  form  a  Jordan  chain  for  the  coefficient 
matrix  A,  prove  that  the  functions  (10.17)  are  solutions  to  the  system  u  =  du. 

(b)  Prove  that  they  are  linearly  independent. 
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With  the  general  solution  formulas  in  hand,  we  are  now  ready  to  study  the  qualitative 
features  of  first  order  linear  dynamical  systems.  Our  primary  focus  will  be  on  stability 
properties  of  the  equilibrium  solution(s).  A  solution  to  an  autonomous  system  of  first 
order  ordinary  differential  equations  u  =  f(u)  is  called  an  equilibrium  solution  if  it  re¬ 
mains  constant  for  all  £,  so  u (t)  =  u*.  Since  its  derivative  vanishes,  this  implies  that  the 
equilibrium  point  u*  satisfies  f(u*)  =  0.  In  particular,  for  a  homogeneous  linear  system 
u  =  du,  the  origin  u*  =  0  is  always  an  equilibrium  point,  meaning  that  a  solution  that 
starts  out  at  0  remains  there.  The  complete  set  of  equilibrium  solutions  consists  of  all 
points  u*  E  kerA  in  the  kernel  of  the  coefficient  matrix,  and  so  the  set  of  equilibrium 
solutions  forms  a  subspace  —  indeed,  an  invariant  subspace  —  of  the  configuration  space. 

In  physical  applications,  the  stability  properties  of  equilibrium  solutions  is  of  crucial 
importance;  see  the  discussion  at  the  beginning  of  Chapter  5.  In  general,  an  equilibrium 
point  is  stable  if  every  solution  that  starts  out  nearby  stays  nearby.  An  equilibrium  is 
called  asymptotically  stable  if  the  nearby  solutions  converge  to  it  as  time  increases.  The 
formal  mathematical  definitions  are  as  follows. 


Definition  10.14.  An  equilibrium  solution  u*  to  an  autonomous  system  of  first  order 
ordinary  differential  equations  u  =  f(u)  is  called 

•  stable  if  for  every  sufficiently  small  e  >  0,  there  exists  a  5  >  0  such  that  every  solution 

u (t)  having  initial  conditions  within  distance  5  >  ||  u(t0)  —  u*  ||  of  the  equilibrium 
remains  within  distance  e  >  ||  u(£)  —  u*  ||  for  all  t  >  t0. 

•  asymptotically  stable  if  it  is  stable  and,  in  addition,  there  exists  £0  >  0  such  that 


whenever  ||  u(t0)  —  u*  ||  <  £0,  then  u (t)  u*  as  t  -A  oo. 


Thus,  although  solutions  nearby  a  stable  equilibrium  point  may  drift  slightly  farther 
away,  they  must  remain  relatively  close.  In  the  case  of  asymptotic  stability,  they  will  even¬ 
tually  return  to  equilibrium.  An  equilibrium  point  is  called  globally  stable  if  the  stability 
condition  holds  for  all  e  >  0.  It  is  called  globally  asymptotically  stable  if  every  solution 
converges  to  the  equilibrium  point:  u (t)  u*  as  t  oo. 

In  the  case  of  a  linear  system,  local  (asymptotic)  stability  implies  global  (asymptotic) 
stability.  This  is  because,  by  linearity,  if  u (t)  is  a  solution,  then  so  is  the  scalar  multiple 
cu(t)  for  all  cgR,  and  hence  every  solution  can  be  scaled  to  one  that  remains  nearby  the 
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Figure  10.2.  The  Left  Half-Plane. 


equilibrium  point.  We  will  henceforth  omit  the  redundant  term  “global”  when  discussing 
the  stability  of  a  linear  system.  We  will  also  focus  our  attention  on  the  particular  equilib¬ 
rium  solution  u*  =  0. 

Remark.  The  stability  and  asymptotic  stability  of  an  equilibrium  solution  are  independent 
of  the  choice  of  norm  in  the  definition  (although  this  will  affect  the  dependence  of  5  on  e). 
This  follows  from  the  equivalence  of  norms  described  in  Theorem  3.17. 

The  starting  point  is  a  simple  calculus  lemma,  whose  proof  is  left  to  the  reader. 
Lemma  10.15.  Let  /q  v  be  real  and  k  >  0.  A  function  of  the  form 

f(t)  —  tk  cos  vt  or  tkefJjtsiniyt  (10.18) 

will  decay  to  zero  for  large  £,  so  lim  f(t)  =  0,  if  and  only  if  \i  <  0.  The  function  remains 

t— >oo 

bounded,  so  |  f(t)  \  <  C  for  some  constant  (7,  for  all  t  >  0  if  and  only  if  either  fi  <  0,  or 
fi  —  0  and  k  —  0. 

Loosely  put,  exponential  decay  will  always  overwhelm  polynomial  growth,  while  the 
trigonometric  sine  and  cosine  functions  remain  neutrally  bounded.  Now,  in  the  solution 
to  our  linear  system,  the  functions  (10.18)  come  from  the  eigenvalues  A  =  fi  +  \  v  of  the 
coefficient  matrix.  The  lemma  implies  that  the  asymptotic  behavior  of  the  solutions,  and 
hence  the  stability  of  the  system,  depends  on  the  sign  of  \i  =  Re  A.  If  fi  <  0,  then  the 
solutions  decay  to  zero  at  an  exponential  rate  as  t  -D  oc.  If  fi  >  0,  then  the  solutions 
become  unbounded  as  t  oo.  In  the  borderline  case  fi  —  0,  the  solutions  remain  bounded, 
provided  that  they  don’t  involve  any  powers  of  t. 

Thus,  in  order  that  the  equilibrium  zero  solution  be  asymptotically  stable ,  all  the  eigen¬ 
values  must  satisfy  fi  —  Re  A  <  0.  Or,  stated  another  way,  all  eigenvalues  must  he  in  the  left 
half-plane  —  the  subset  of  the  complex  plane  C  to  the  left  of  the  imaginary  axis  sketched  in 
Figure  10.2.  In  this  manner,  we  have  demonstrated  the  fundamental  asymptotic  stability 
criterion’*'  for  linear  systems. 


^  This  is  not  the  same  as  the  stability  criterion  for  linear  iterative  systems,  which  requires  that 
the  eigenvalues  of  the  coefficient  matrix  lie  in  the  inside  the  unit  circle,  cf.  Theorem  9.12. 
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Theorem  10.16.  A  first  order  autonomous  homogeneous  linear  system  of  ordinary  dif¬ 
ferential  equations  u  =  iu  has  an  asymptotically  stable  zero  solution  if  and  only  if  all  the 
eigenvalues  A  of  its  coefficient  matrix  A  lie  in  the  left  half-plane:  Re  A  <  0.  If  A  has  one 
or  more  eigenvalues  with  positive  real  part,  Re  A  >  0,  then  the  zero  solution  is  unstable. 


Example  10.17.  Consider  the  system 


du 

dt 


—  2  u  —  6v  +  w, 


dv 


=  3u  —  3v  —  w. 


dw 

dt 


—  3u  —  v  —  3w, 


The  coefficient  matrix  A  — 


dt 

2  -6  1 

3  —3  —1  |  is  found  to  have  eigenvalues 

3  -1  -3 


X1  =  —  2,  X2  —  —  l+i\/6,  A3  =  —  1  —  i  \/6  , 

with  respective  real  parts  —2,  -1,-1.  The  Stability  Theorem  10.16  implies  that  the  equi¬ 
librium  solution  u*  =  =  w*  =  0  is  asymptotically  stable.  Indeed,  every  solution  involves 

the  functions  e_2t,  e~t  cos  VEt,  and  e~t  sin  a/6  £,  all  of  which  decay  to  0  at  an  exponential 
rate.  The  latter  two  have  the  slowest  decay  rate,  and  so  most  solutions  to  the  linear  system 
go  to  0  in  proportion  to  e-t,  i.e.,  at  an  exponential  rate  determined  by  the  least  negative 
real  part. 


The  final  statement  is  a  special  case  of  the  following  general  result,  whose  proof  is  left 
to  the  reader. 


Proposition  10.18.  If  u (t)  is  any  solution  to  u  =  A u,  then  ||  u(t)  ||  <  C  eat  for  all 
t  >t0  and  for  all  a  >  a*  —  max{  Re  A  |  A  is  an  eigenvalue  of  A  },  where  the  constant  C  >  0 
depends  on  the  solution  and  choice  of  norm.  If  the  eigenvalue (s)  A  achieving  the  maximum, 
Re  A  =  a*,  are  complete,  then  one  can  set  a  —  a* . 


Asymptotic  stability  implies  that  the  solutions  return  to  equilibrium;  stability  only 
requires  them  to  stay  nearby.  The  appropriate  eigenvalue  criterion  is  readily  established. 

Theorem  10.19.  A  first  order  linear,  homogeneous,  constant-coefficient  system  of  ordi¬ 
nary  differential  equations  (10.1)  has  a  stable  zero  solution  if  and  only  if  all  its  eigenvalues 
satisfy  Re  A  <0,  and,  moreover,  any  eigenvalue  lying  on  the  imaginary  axis,  so  Re  A  =  0, 
is  complete,  meaning  that  it  has  as  many  independent  eigenvectors  as  its  multiplicity. 


Proof :  The  proof  is  the  same  as  before,  based  on  Theorem  10.13  and  the  decay  properties 
in  Lemma  10.15.  All  the  eigenvalues  with  negative  real  part  lead  to  exponentially  decaying 
solutions  —  even  if  they  are  incomplete.  If  the  coefficient  matrix  has  a  complete  zero  eigen¬ 
value,  then  the  corresponding  eigensolutions  are  all  constant,  and  hence  trivially  bounded. 
On  the  other  hand,  if  0  is  an  incomplete  eigenvalue,  then  the  associated  Jordan  chain  so¬ 
lutions  involve  non-constant  polynomials,  and  become  unbounded  as  t  — ^  Too.  Similarly, 
if  a  purely  imaginary  eigenvalue  is  complete,  then  the  associated  solutions  only  involve 
trigonometric  functions,  and  hence  remain  bounded,  whereas  the  solutions  associated  with 
an  incomplete  purely  imaginary  eigenvalue  contain  polynomials  in  t  multiplying  sines  and 
cosines,  and  hence  cannot  remain  bounded.  Q.E.D. 

A  particularly  important  class  of  systems  consists  of  the  linear  gradient  flows 

du 
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in  which  K  is  a  symmetric,  positive  definite  matrix.  According  to  Theorem  8.35,  all  the 
eigenvalues  of  K  are  real  and  positive,  and  so  the  eigenvalues  of  the  negative  definite 
coefficient  matrix  —K  for  the  gradient  flow  system  (10.19)  are  real  and  negative.  Applying 
Theorem  10.16,  we  conclude  that  the  zero  solution  to  any  gradient  flow  system  (10.19)  with 
negative  definite  coefficient  matrix  —  K  is  asymptotically  stable.  If  the  coefficient  matrix 
is  negative  semi-definite,  the  the  equilibrium  solutions  are  stable,  since  the  eigenvalues  are 
necessarily  complete. 


Example  10.20.  On  applying  the  test  we  learned  in  Chapter  3,  the  matrix  K 
is  seen  to  be  positive  definite.  The  associated  gradient  flow  is 


1 

5 


du 

dt 


—  u  —  v, 


dv 

dt 


—  u  —  5v. 


The  eigenvalues  and  eigenvectors  of  —  K  = 


—  —  3  +  a/5  , 


A2  =  -3-  VE, 


(10.20) 


Therefore,  the  general  solution  to  the  system  is 


J-3+VE)t 


u  (t)  =  c1e 


V5)+C2e(  3  ^(2  +  75)’ 


or,  in  components, 

u(t)  =  cle^+^t  +  c2e^~^t, 
v(t)  —  cx  (2  —  a/5)  +  c2  (2  +  a/5) 

All  solutions  tend  to  zero  as  t  —¥  oc  at  the  exponential  rate  prescribed  by  the  least  negative 
eigenvalue,  which  is  —  3  +  a/5  —.7639.  This  confirms  the  asymptotic  stability  of  the 

gradient  flow. 


The  reason  for  the  term  “gradient  flow”  is  that  the  vector  field  —  Ku  appearing  on  the 
right-hand  side  of  (10.19)  is,  in  fact,  the  negative  of  the  gradient  of  the  quadratic  function 


1  n 

q(u)  =  iuT/Tu=  - 

k{j  u i  u- , 

ij  =  1 

Thus,  we  can  write  (10.19)  as 

du 

dt 

so  that  Vg(u)  —  K u. 


=  -  V<?(u), 


For  the  particular  system  (10.20), 

q(u ,  v)  —  A  ( u  v  )  ^ 

and  so  the  gradient  flow  is  given  by 

du  dq 


1 

5 


u 

v 


\  u2  +  UV  +  |  V 2, 


dt 


du 


—  —  u  —  V, 


dv 

dt 


dq 

dv 


=  —  u  —  5v. 


(10.21) 


(10.22) 


As  you  learn  in  multivariable  calculus,  [2,  78],  the  gradient  X7q  of  a  function  q  points 
in  the  direction  of  its  steepest  increase,  while  its  negative  —  X7q  points  in  the  direction  of 
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steepest  decrease.  Thus,  the  solutions  to  the  gradient  flow  system  (10.22)  will  decrease  q(u) 
as  rapidly  as  possible,  tending  to  its  minimum  at  u*  =  0.  For  instance,  if  q(u:  v)  represents 
the  height  of  a  hill  at  position  (u,  v),  then  the  solutions  to  (10.22)  are  the  paths  of  steepest 
descent  followed  by,  say,  water  flowing  down  the  hill  (provided  we  ignore  inertial  effects) .  In 
physical  applications,  the  quadratic  function  (10.21)  often  represents  the  potential  energy 
in  the  system,  and  the  gradient  flow  models  the  natural  behavior  of  systems  that  seek  to 
minimize  their  energy  as  rapidly  as  possible. 

Example  10.21.  Another  extremely  important  class  of  dynamical  systems  comprises 

the  Hamiltonian  systems,  first  developed  by  the  nineteenth-century  Irish  mathematician 
William  Rowan  Hamilton,  who  also  discovered  quaternions,  developed  in  Exercise  7.2.23. 
In  particular,  a  planar  Hamiltonian  system  takes  the  form 

du  _dH  dv  _  dH 

dt  dv  ’  dt  du 

where  H(u,v)  is  known  as  the  Hamiltonian  function.  If 

H(u,v)  =  | au2-\-buv+^cv 2 

is  a  quadratic  form,  then  the  corresponding  Hamiltonian  system 


(10.23) 

(10.24) 


u  =  bu  +  cv, 


v  =  —  au  —  bv. 


(10.25) 


is  homogeneous  linear,  with  coefficient  matrix  A  = 
teristic  equation  is 


det(A  —  AI)  =  A2  +  (ac 


b  c\ 
—a  —b J ’ 

b2)  =  0. 


The  associated  charac- 


If  H  is  positive  or  negative  definite,  then  ac  —  b2  >  0,  and  so  the  eigenvalues  are  purely 

imaginary:  A  =  ±  i  \Jac—  b2  and  complete,  since  they  are  simple.  Thus,  the  stability 
criterion  of  Theorem  10.19  holds,  and  we  conclude  that  planar  Hamiltonian  systems  with 
a  definite  Hamiltonian  function  are  stable.  On  the  other  hand,  if  H  is  indefinite,  then  the 
coefficient  matrix  has  one  positive  and  one  negative  eigenvalue,  and  hence  the  Hamiltonian 
system  is  unstable. 

In  physical  applications,  the  Hamiltonian  function  H(u,v)  represents  the  energy  of  the 
system.  According  to  Exercise  10.2.22,  the  Hamiltonian  energy  function  is  automatically 
conserved,  meaning  that  it  is  constant  on  every  solution:  H(u(t),v(t))  =  constant.  This 
means  that  the  solutions  move  along  its  level  sets;  in  the  stable  cases  these  are  bounded 
ellipses,  whereas  in  the  unstable  cases  they  are  unbounded  hyperbolas. 


Remark.  The  equations  of  classical  mechanics,  such  as  motion  of  masses  (sun,  planets, 
comets,  etc.)  under  gravitational  attraction,  can  all  be  formulated  as  Hamiltonian  systems, 
31].  Moreover,  the  Hamiltonian  formulation  is  a  crucial  first  step  in  the  physical  process 
of  quantizing  the  classical  mechanical  equations  to  determine  the  quantum  mechanical 
equations  of  motion,  [54]. 


Exercises 


10.2.1.  Classify  the  following  systems  according  to  whether  the  origin  is  (i)  asymptotically 


du 


dv 


du 


stable,  (ii)  stable,  or  (in)  unstable:  (a)  —  =  —2 u  —  v,  —  =  u  —  2a;  ( b )  —  =2u  —  5v 


dv 

dt 


=  u  —  v ; 


,  .  du  _  dv 

(c)  —~  =  —u  —  2v,  — 
dt  dt 


dt 

=  2  u  —  5  a; 


dt  dt 

f  ji\  du  dv 

(d)  ^  =  “2v’  Tt=8u; 
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0) 

(f) 

(g) 

(■ h ) 


du 

dt 

du 

dt 

du 

dt 

du 

dt 


n  dv  dw  0 

2u  —  v  +  w,  — -  =  —  u  —  2v  +  W)  — Su  —  3v  +  2w] 


dt 


dt 


—  u  —  2v. 


dv  dw 

=  bu  +  ov  —  4:W ,  ——=4  u  —  6w] 


dt 


dv 


dt 

dw 


2u  —  v-\-3w,  —=u  —  v-{-w,  — —  =  —  4a  +  v  —  5w; 


dt 


dt 


dv  _  dw 

u  +  v  —  w,  ——  =  —  2u  —  3  v  +  3  w ,  -r-  =  —v  +  w. 


dt 


dt 


10.2.2.  Write  out  the  formula  for  the  general  real  solution  to  the  system  in  Example  10.17  and 
verify  its  stability. 

10.2.3.  Write  out  and  solve  the  gradient  flow  system  corresponding  to  the  following  quadratic 
forms:  (a)  u2  +  v2 ,  (b)  uv,  (c)  4 u2  —  2 uv  T  v2 ,  (d)  2u2  —  uv  —  2 uw  +  2v2  —  vw  +  2 w2 . 

10.2.4.  Write  out  and  solve  the  Hamiltonian  systems  corresponding  to  the  first  three  quadratic 
forms  in  Exercise  10.2.3.  Which  of  them  are  stable? 


10.2.5.  Which  of  the  following  2  x  2  systems  are  gradient  flows?  Which  are  Hamiltonian 
systems?  In  each  case,  discuss  the  stability  of  the  zero  solution. 

u  =  —  2u  +  v , 


(a) 


V 


u 


2v. 


(b) 


u 


u 


10.2.6.  (a)  Show  that  the  matrix  A  = 


(c) 

u  - 

■ 

=  V, 

(d) 

u  =  —v,  ,  x  u  =  —u  —  2v, 

(e)  ■ 

+ 

v  - 

=  u , 

v  =  u,  v  =  —2a  —  v. 

/ 

0 

1 

1 

°\ 

-1 

0 

0 

0 

0 

0 

1 

1 

has  A 

=  ±  i  as  incomplete  complex 

V 

0 

0 

-1 

o) 

conjugate  eigenvalues,  (b)  Find  the  general  real  solution  to  u  =  Hu. 

(c)  Explain  the  behavior  of  a  typical  solution.  Why  is  the  zero  solution  not  stable? 


10.2.7.  Let  A  be  a  real  3x3  matrix,  and  assume  that  the  linear  system  u  =  4u  has  a  periodic 
solution  of  period  P.  Prove  that  every  periodic  solution  of  the  system  has  period  P.  What 
other  types  of  solutions  can  there  be?  Is  the  zero  solution  necessarily  stable? 


10.2.8.  Are  the  conclusions  of  Exercise  10.2.7  valid  when  A  is  a  4  x  4  matrix? 

10.2.9.  Let  A  be  a  real  5x5  matrix,  and  assume  that  A  has  eigenvalues  i ,  —  i ,  —2,  —1  (and  no 
others).  Is  the  zero  solution  to  the  linear  system  u  =  4u  necessarily  stable?  Explain.  Does 
your  answer  change  if  A  is  6  x  6? 


10.2.10.  Prove  that  if  A  is  strictly  diagonally  dominant  and  each  diagonal  entry  is  negative, 
then  the  zero  equilibrium  solution  to  the  linear  system  of  ordinary  differential  equations 
u  =  Hu  is  asymptotically  stable. 

10.2.11.  True  or  false:  The  system  u  =  —Hn  u,  where  Hn  is  the  n  x  n  Hilbert  matrix  (1.72),  is 
asymptotically  stable. 


10.2.12.  True  or  false:  If  the  zero  solution  of  the  linear  system  of  differential  equations 
u  =  Hu  is  asymptotically  stable,  so  is  the  zero  solution  of  the  linear  iterative  system 

u(fc+i)  _  Au(k)  with  the  same  coefficient  matrix. 


10.2.13.  Let  u (t)  solve  u  =  Hu.  Let  v(t)  =  u(—  t)  be  its  time  reversal. 

(a)  Write  down  the  linear  system  v  =  Bw  satisfied  by  v(t).  Then  classify  the  following 
statements  as  true  or  false.  As  always,  explain  your  answers,  (b)  If  u  =  Hu  is 
asymptotically  stable,  then  v  =  Bw  is  unstable,  (c)  If  u  =  Hu  is  unstable,  then 
v  =  Bw  is  asymptotically  stable,  (d)  If  u  =  Hu  is  stable,  then  v  =  Bw  is  stable. 

10.2.14.  True  or  false:  (a)  IftrH  0,  then  the  system  u  —  Hu  is  unstable. 

(b)  If  det  H  >  0,  then  the  system  u  =  Hu  is  unstable. 
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10.2.15.  True  or  false:  If  K  is  positive  semi-definite,  then  the  zero  solution  to  u  =  -Ku  is  stable. 

_  r\ 

10.2.16.  True  or  false:  If  A  is  a  symmetric  matrix,  then  the  system  u  =  —Aw  has  an 
asymptotically  stable  equilibrium  solution. 

10.2.17.  Consider  the  differential  equation  u  =  —K u,  where  K  is  positive  semi-definite. 

(a)  Find  all  equilibrium  solutions,  (b)  Prove  that  all  non-constant  solutions  decay 

exponentially  fast  to  some  equilibrium.  What  is  the  decay  rate?  (c)  Is  the  origin  stable, 
asymptotically  stable,  or  unstable?  (d)  Prove  that,  as  t  — >  oo,  the  solution  u (t)  converges 
to  the  orthogonal  projection  of  its  initial  vector  a  =  u(0)  onto  keriU 


10.2.18.  Suppose  that  u (£)  satisfies  the  gradient  flow  system  (10.22). 


(a)  Prove  that  q( u)  = 

Ci/i/ 


K  u 


(b)  Explain  why  if  u (t)  is  any  nonconstant  solution  to  the  gradient  flow,  then  g(u(£))  is  a 
strictly  decreasing  function  of  t,  thus  quantifying  how  fast  a  gradient  flow  decreases  energy. 


O  O 

10.2.19.  Let  H(u,v)  =  au  +  b  uv  +  cv  be  a  quadratic  function,  (a)  Prove  that  the  non¬ 
equilibrium  trajectories  of  the  associated  Hamiltonian  system  and  those  of  the  gradient  flow 
are  mutually  orthogonal,  i.e.,  they  always  intersect  at  right  angles,  (b)  Verify  this  result 
for  the  particular  quadratic  functions  (i)  u2  A  3 v2,  (ii)  uv,  by  drawing  representative 
trajectories  of  both  systems  on  the  same  graph. 


10.2.20.  True  or  false:  If  the  Hamiltonian  system  for  H(u,v )  is  stable,  then  the  corresponding 
gradient  flow  u  =  —  Vid  is  stable. 

10.2.21.  True  or  false:  A  nonzero  linear  2x2  gradient  flow  cannot  be  a  Hamiltonian  flow. 

T  10.2.22.  The  law  of  conservation  of  energy  states  that  the  energy  in  a  Hamiltonian  system 
is  constant  on  solutions,  (a)  Prove  that  if  u (t)  satisfies  the  Hamiltonian  system  (10.23), 

then  H(w(t))  =  c  is  a  constant,  and  hence  solutions  u (t)  move  along  the  level  sets  of 
the  Hamiltonian  or  energy  function.  Explain  how  the  value  of  c  is  determined  by  the 
initial  conditions,  (b)  Plot  the  level  curves  of  the  particular  Hamiltonian  function 
H(u,v )  =  u  —  2 uv  A  2v  and  verify  that  they  coincide  with  the  solution  trajectories. 


10.2.23.  True  or  false:  A  nonzero  linear  2x2  gradient  flow  cannot  be  a  Hamiltonian  system. 

du. 

10.2.24.  (a)  Explain  how  to  solve  the  inhomogeneous  system  — —  =  Au  +  b  when  b  is  a 

C LL 

constant  vector  belonging  to  imgA.  Hint:  Look  at  v(t)  =  u (t)  —  u*  where  u*  is  an 
equilibrium  solution,  (b)  Use  your  method  to  solve 


(0 


du 

dt 


u 


3v  A  1, 


dv 

dt 


, ...  du  .  _  dv 

u~v ’  {w)M=4v  +  2'Tt 


U 


3. 


0  10.2.25.  Prove  Lemma  10.15. 

0  10.2.26.  Prove  Proposition  10.18. 


10.3  Two-Dimensional  Systems 


The  two-dimensional  case  is  particularly  instructive,  since  it  is  relatively  easy  to  analyze, 
but  already  manifests  most  of  the  key  phenomena  to  be  found  in  higher  dimensions.  More¬ 
over,  the  solutions  can  be  easily  pictured  and  their  behavior  understood  through  their 
phase  portraits.  In  this  section,  we  will  present  a  complete  classification  of  the  possible 
qualitative  behaviors  of  real,  planar  linear  dynamical  systems. 

Setting  u(t)  =  (u(t),  v{t))T ,  a  first  order  planar  homogeneous  linear  system  has  the 
explicit  form 


du  . 

—  au  A  bv, 


dv 


cu  +  dv, 


dt 


dt 


(10.26) 
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where  A  = 


a 

c 


b 

d 


is  the  (constant)  coefficient  matrix.  As  in  Section  10.1,  we  will  refer 

to  the  'uv-plane  as  the  phase  plane.  In  particular,  the  phase  plane  equivalents  (10.8)  of 
second  order  scalar  equations  form  a  subclass  thereof. 

According  to  (8.21),  the  characteristic  equation  for  the  given  2x2  matrix  is 


det  (A  —  A I )  —  X2  —  r  X-\-  5  =  0, 


(10.27) 


where 

r  =  tv  A  =  a  +  d,  S  —  det  A  —  ad  —  be,  (10.28) 

are,  respectively,  the  trace  and  the  determinant  of  A.  The  eigenvalues,  and  hence  the 
nature  of  the  solutions,  are  almost  entirely  determined  by  these  two  quantities.  The  sign 
of  the  discriminant 


A  =  r2  —  4  S  =  (tr  A)2  —  4  det  A  =  (a  —  d)2  +  46c  (10.29) 


determines  whether  the  eigenvalues 

(10. 

are  real  or  complex,  and  thereby  plays  a  key  role  in  the  classification. 

Let  us  summarize  the  different  possibilities  as  distinguished  by  their  qualitative  behav¬ 
ior.  Each  category  will  be  illustrated  by  a  representative  phase  portrait,  which  displays 
several  typical  solution  trajectories  in  the  phase  plane.  A  complete  portrait  gallery  of 
planar  systems  can  be  found  in  Figure  10.3. 


Distinct  Real  Eigenvalues 

The  coefficient  matrix  A  has  two  distinct  real  eigenvalues  <  A2  if  and  only  if  the 
discriminant  is  positive:  A  >  0.  In  this  case,  the  solutions  take  the  exponential  form 

u(t)  =  c1  eXlt  v1  +  c2  eAst  v2,  (10.31) 


where  v1?v2  are  the  eigenvectors  and  c1?c2  are  arbitrary  constants,  to  be  determined  by 
the  initial  conditions.  Let  Vk  =  {cwk  \  c  E  M}  for  k  —  1,2,  denote  the  two  “eigenlines” , 
i.e.,  the  one-dimensional  eigenspaces. 

The  asymptotic  behavior  of  the  solutions  is  governed  by  the  eigenvalues.  There  are  five 
qualitatively  different  cases,  depending  upon  their  signs.  These  are  listed  by  their  descrip¬ 
tive  name,  followed  by  the  required  conditions  on  the  discriminant,  trace,  and  determinant 
of  the  coefficient  matrix  that  serve  to  prescribe  the  form  of  the  eigenvalues. 

Ia.  Stable  Node :  A  >  0,  tr  A  <  0,  det  A  >  0. 


If  X1  <  A2  <  0  are  both  negative,  then  0  is  an  asymptotically  stable  node.  The  solutions 
all  tend  to  0  as  f  D  oo.  Since  the  first  exponential  eXlt  decreases  much  faster  than  the 
second  eAst,  the  first  term  in  the  solution  (10.31)  will  soon  become  negligible,  and  hence 
U (t)  ~  C2  eA2tv2  when  t  is  large,  provided  c2  ^  0.  Such  solutions  will  arrive  at  the  origin 
along  curves  tangent  to  the  eigenline  V2,  including  those  with  c1  =  0,  which  move  directly 
along  the  eigenline.  On  the  other  hand,  the  solutions  with  c2  =  0  come  in  to  the  origin 
along  the  eigenline  V1:  at  a  faster  rate.  Conversely,  as  f  D  —  oc,  all  solutions  become 
unbounded:  ||  u(t)  ||  -D  oc.  In  this  case,  the  first  exponential  grows  faster  than  the  second, 

and  so  u (t)  ~  c1eAltv1  for  t  «  0.  In  other  words,  as  they  escape  to  oc,  the  solution 
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trajectories  become  more  and  more  parallel  to  the  eigenline  V1  —  except  for  those  with 
c1=  0,  which  remain  on  the  eigenline  V2. 

lb.  Saddle  Point:  A  >  0,  detA<0. 

If  Ax  “x  0  <\  A2 ,  then  0  is  an  unstable  saddle  poznt.  Solutions  ^10.31^)  with  C2  —  0  start 
out  on  the  eigenline  V1  and  go  in  to  0  as  t  -T  00,  while  solutions  with  c1  =  0  start  on  V2 
and  go  to  0  as  t  -T  —  00.  All  other  solutions  become  unbounded  at  both  large  positive  and 
large  negative  times.  As  t  -T  +00,  they  asymptotically  approach  the  unstable  eigenline  V2, 
while  as  t  —  00,  they  approach  the  stable  eigenline  V1. 

lc.  Unstable  Node:  A  >  0,  trA>0,  detA>0. 

If  the  eigenvalues  0  <  A:  <  A2  are  both  positive,  then  0  is  an  unstable  node.  The  phase 
portrait  is  the  same  as  that  of  a  stable  node,  but  the  solution  trajectories  are  traversed 
in  the  opposite  direction.  Time  reversal  t  —¥  —t  will  convert  an  unstable  node  into  a 
stable  node  and  vice  versa.  Thus,  in  the  unstable  case,  the  solutions  all  tend  to  the  origin 
as  t  —¥  —  oc  and  become  unbounded  as  t  00.  Except  for  the  eigensolutions,  they 
asymptotically  approach  V1  as  t  —¥  —  oc,  and  become  parallel  to  V2  as  t  -T  oc. 

ld.  Stable  Line:  A  >  0,  tr  A  <  0,  detA  =  0. 

If  X1  <  A2  =  0,  then  every  point  on  the  eigenline  V2  associated  with  the  zero  eigenvalue 
is  a  stable  equilibrium  point.  The  other  solutions  move  along  straight  lines  parallel  to  V1: 
asymptotically  approaching  one  of  the  equilibrium  points  on  V2  as  t  oc.  On  the  other 
hand,  as  t  —  oc,  all  solutions  except  those  sitting  still  on  the  eigenline  move  off  to  00. 

le.  Unstable  Line:  A  >  0,  trA>0,  detA  =  0. 

This  is  merely  the  time  reversal  of  a  stable  line.  If  0  =  A:  <  A2,  then  every  point  on 
the  eigenline  V1  is  an  equilibrium.  The  other  solutions  moves  off  to  oc  along  straight  lines 
parallel  to  V2  as  t  oc,  and  tend  to  an  equilibrium  on  Ij  as  t  —  00. 


Complex  Conjugate  Eigenvalues 


v 


The  coefficient  matrix  A  has  two  complex  conjugate  eigenvalues 

A±  =  (i  =b  i  z/,  where  /a  =  |  r  =  \  tr  A, 

if  and  only  if  its  discriminant  is  negative:  A  <  0.  In  this  case,  the  real  solutions  can  be 
written  in  the  phase-amplitude  form 


=  v7^, 


u (t)  —  re'1'  [cos(W  —  <r)  w  —  sm(i/t  —  o) 


(10.32) 


where  w  =b  iz  are  the  complex  eigenvectors.  As  noted  in  Exercise  8.3.12,  the  real  vectors 
w,  z  are  always  linearly  independent.  The  amplitude  r  and  phase  shift  a  are  uniquely 
prescribed  by  the  initial  conditions.  There  are  three  subcases,  depending  upon  the  sign  of 
the  real  part  /i,  or,  equivalently,  the  sign  of  the  trace  of  A. 

Ha.  Stable  Focus:  A  <  0,  trA<0. 


If  /a  <  0,  then  0  is  an  asymptotically  stable  focus.  As  t  oc,  the  solutions  all  spiral 
in  to  the  origin  at  an  exponential  rate  e^  with  a  common  “frequency”  v  —  meaning  it 
takes  time  2tt / for  the  solution  to  spiral  once  around  the  origin^.  On  the  other  hand,  as 


^  But  keep  in  mind  that  these  solutions  are  not  periodic.  Thus,  27t  jv  is  the  time  interval  between 
successive  intersections  of  the  solution  and  a  fixed  ray  emanating  from  the  origin,  e.g.,  the  positive 


x-axis. 
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t  —¥  —  oo,  the  solutions  spiral  off  to  oo  at  the  same  exponential  rate  whilst  maintaining 
their  overall  frequency. 

II b.  Center :  A  <  0,  tvA  =  0. 

If  fi  =  0,  meaning  that  the  eigenvalues  A±  =  =b  i  v  are  purely  imaginary,  then  0  is  a 
center.  The  solutions  all  move  periodically  around  elliptical  orbits,  with  common  frequency 
v  and  hence  period  2'k/v.  In  particular,  solutions  that  start  out  near  0  stay  nearby,  and 
hence  a  center  is  a  stable,  but  not  asymptotically  stable,  equilibrium. 

He.  Unstable  Focus :  A  <  0,  tr  A  >  0. 

If  (i  >  0,  then  0  is  an  unstable  focus.  The  phase  portrait  is  the  time  reversal  of  a  stable 
focus,  with  solutions  having  an  unbounded  spiral  motion  as  t  oo,  and  spiraling  in  to  the 
origin  as  t  —  oo,  again  at  an  exponential  rate  e N  with  a  common  “frequency”  v. 

Incomplete  Double  Real  Eigenvalue 

The  coefficient  matrix  has  a  double  real  eigenvalue  A  =  \  r  =  ^  tr  A  if  and  only  if  the 
discriminant  vanishes:  A  =  0.  The  formula  for  the  solutions  depends  on  whether  the 
eigenvalue  A  is  complete.  If  A  is  an  incomplete  eigenvalue,  admitting  only  one  independent 
eigenvector  v,  then  the  solutions  are  no  longer  given  by  simple  exponentials.  The  general 
solution  formula  is 

u (t)  =  (cq  +  c2  t)ext  v  +  c2  ext  w,  (10.33) 

where  ( A  —  AI)w  =  v,  and  so  v,  w  form  a  Jordan  chain  for  the  coefficient  matrix.  We  let 
V  =  {cv}  denote  the  eigenline  associated  with  the  genuine  eigenvector  v. 

Ilia.  Stable  Improper  Node :  A  =  0,  ti  A  <  0,  A  AI. 

If  A  <  0  then  0  is  an  asymptotically  stable  improper  node.  Since  text  is  larger  than  ext 
for  t  >  1,  when  c2  0,  the  solutions  u(£)  ps  c2text  tend  to  0  as  t  D  oo  along  a  curve 
that  is  tangent  to  the  eigenline  V,  while  the  eigensolutions  with  c2  =  0  move  in  to  the 
origin  along  the  eigenline.  Similarly,  as  t  —  oo,  the  solutions  go  off  to  oo  in  the  opposite 
direction  from  their  approach,  becoming  more  and  more  parallel  to  the  same  eigenline. 

lll b.  Linear  Motion:  A  =  0,  trA  =  0,  A^Xl. 

If  A  =  0,  then  every  point  on  the  eigenline  V  is  an  unstable  equilibrium  point.  Every 
other  solution  is  a  linear  polynomial  in  t,  and  so  moves  along  a  straight  line  parallel  to  V, 
going  off  to  oo  in  either  direction. 

lllc.  Unstable  Improper  Node:  A  =  0,  tr  A  >  0,  i  ^  AI. 

If  A  >  0,  then  0  is  an  unstable  improper  node.  The  phase  portrait  is  the  time  reversal 
of  the  stable  improper  node.  Solutions  go  off  to  oo  as  t  increases,  becoming  progressively 
more  parallel  to  the  eigenline,  and  tend  to  the  origin  tangent  to  the  eigenline  as  t  — )>  —  oo. 

Complete  Double  Real  Eigenvalue 

In  this  case,  every  vector  in  M2  is  an  eigenvector,  and  so  the  real  solutions  take  the  form 
U  (t)  =  ext  v,  where  v  is  an  arbitrary  constant  vector.  In  fact,  this  case  occurs  if  and  only 
if  A  =  A I  is  a  scalar  multiple  of  the  identity  matrix. 
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la.  Stable  Node 


lb.  Saddle  Point 


Ic.  Unstable  Node 


/ 


lib.  Center 


Ilia.  Stable  Improper  Node 


Illb.  Linear  Motion  IIIc.  Unstable  Improper  Node 


Id.  Stable  Line  Ie.  Unstable  Line 


Figure  10.3. 
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Figure  10.4.  Stability  Regions  for  Two-Dimensional  Linear  Systems. 

IVa.  Stable  Star:  A  —  AI,  A  <  0. 

If  A  <  0,  then  0  is  an  asymptotically  stable  star.  The  solution  trajectories  are  the  rays 
coming  in  to  the  origin,  and  the  solutions  go  to  0  at  a  common  exponential  rate  ext  as 
t  —¥  oo. 

IV b.  Trivial :  A  =  O. 

If  A  =  0,  then  the  only  possibility  is  A  =  O.  Now  every  solution  is  constant  and  every 
point  is  a  (stable)  equilibrium  point.  Nothing  happens!  This  is  the  only  case  not  pictured 
in  Figure  10.3. 

IVc.  Unstable  Star:  A  =  AI,  A  >  0. 

If  A  >  0,  then  0  is  an  unstable  star.  The  phase  portrait  is  the  time  reversal  of  the  stable 
star,  and  so  the  solutions  move  out  along  rays  as  t  oo  at  an  exponential  rate  eAt,  while 
tending  to  0  as  t  —  oo. 

Figure  10.4  summarizes  the  different  possibilities,  as  prescribed  by  the  trace  and  deter¬ 
minant  of  the  coefficient  matrix.  The  horizontal  axis  indicates  the  value  of  r  =  tr  A ,  while 
the  vertical  axis  refers  to  S  =  det  A.  Points  on  the  parabola  r2  =  4  6  represent  the  cases 
with  vanishing  discriminant  A  =  0,  and  correspond  to  either  stars  or  improper  nodes 
except  for  the  origin,  which  is  either  linear  motion  or  trivial.  All  the  asymptotically  stable 
cases  he  in  the  shaded  upper  left  quadrant  where  tr  A  <  0  and  det  A  >  0.  The  borderline 
points  are  either  stable  centers,  when  tr  A  =  0,  det  A  >  0,  or  stable  lines,  when  tr  A  <  0, 
det  A  =  0,  or  the  origin,  which  may  or  may  not  be  stable  depending  upon  whether  A  is 
the  zero  matrix  or  not.  All  other  values  for  the  trace  and  determinant  result  in  unstable 
equilibria.  Summarizing: 
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Proposition  10.22.  Let  r,  5  denote,  respectively,  the  trace  and  determinant  of  the  coeffi¬ 
cient  matrix  A  of  a  homogeneous,  linear,  autonomous  planar  system  of  first  order  ordinary 
differential  equations.  Then  the  system  is 

(z)  asymptotically  stable  if  and  only  if  5  >  0  and  r  <  0; 

(ii)  stable  if  and  only  if  5  >  0,  r  <  0,  and,  if  5  =  r  =  0,  also  A  =  O. 

Remark.  Time  reversal  t  —t  changes  the  sign  of  the  coefficient  matrix  A  -T  —  A, 
and  hence  the  sign  of  its  trace,  r  —  r,  while  the  determinant  5  =  detA  =  det(—  A)  is 
unchanged.  Thus,  the  effect  is  to  reflect  Figure  10.4  through  the  vertical  axis,  interchanging 
the  stable  nodes  and  spirals  with  their  unstable  counterparts,  while  taking  saddle  points 
to  saddle  points. 

In  physical  applications,  the  parameters  occurring  in  the  dynamical  system  are  usu¬ 
ally  not  known  exactly,  and  so  the  real  dynamics  may,  in  fact,  be  governed  by  a  slight 

perturbation  of  the  mathematical  model.  Thus,  it  is  important  to  know  which  systems 

are  structurally  stable ,  meaning  that  their  basic  qualitative  features  are  preserved  under 
sufficiently  small  changes  in  the  coefficients.  Now,  a  small  perturbation  will  alter  the  co¬ 
efficient  matrix  slightly,  and  hence  shift  its  trace  and  determinant  by  a  comparably  small 
amount.  The  net  effect  is  to  slightly  perturb  its  eigenvalues.  Therefore,  the  question  of 
structural  stability  reduces  to  whether  the  eigenvalues  have  moved  sufficiently  far  to  send 
the  system  into  a  different  stability  regime.  Asymptotically  stable  systems  remain  asymp¬ 
totically  stable  since  a  sufficiently  small  perturbation  will  not  alter  the  signs  of  the  real 
parts  of  its  eigenvalues.  For  a  similar  reason,  unstable  systems  remain  unstable  under 
small  perturbations.  On  the  other  hand,  a  borderline  stable  system  —  either  a  center  or 
the  trivial  system  —  might  become  either  asymptotically  stable  or  unstable,  even  under  a 
minuscule  perturbation.  Such  results  continue  to  hold,  at  least  locally,  even  under  suitably 
small  nonlinear  perturbations,  and  thereby  he  at  the  foundations  of  nonlinear  dynamics. 

Structural  stability  requires  a  bit  more,  since  the  overall  phase  portrait  should  not 
significantly  change.  A  system  in  any  of  the  open  regions  in  the  Stability  Figure  10.4,  i.e., 
a  stable  or  unstable  focus,  a  stable  or  unstable  node,  or  a  saddle  point,  is  structurally  stable, 
whereas  a  system  that  lies  on  the  parabola  r2  =  4  5,  or  the  horizontal  axis,  or  the  positive 
vertical  axis,  e.g.,  an  improper  node,  a  stable  line,  etc.,  is  not,  since  a  small  perturbation 
can  easily  kick  it  into  either  of  the  adjoining  regions.  Thus,  structural  stability  requires 
that  the  eigenvalues  be  distinct,  X7  ^  A  -  ,  and  have  non-zero  real  part:  Re  A  ^  0.  This  final 
result  remains  valid  for  linear  systems  in  higher  dimensions,  [36,41].  See  also  [69,90] 
and  the  brief  remarks  on  page  525  concerning  the  perturbation  theory  of  eigenvalues,  in 
which  Wilkinson’s  spectral  condition  number  quantifies  to  what  extent  the  eigenvalues  are 
affected  by  a  perturbation  of  the  coefficient  matrix. 


Exercises 


10.3.1.  For  each  the  following:  (a)  Write  the  system  as  u  =  Aw.  (b)  Find  the  eigenvalues  and 
eigenvectors  of  A.  (c)  Find  the  general  real  solution  of  the  system,  (d)  Draw  the  phase 
portrait,  indicating  its  type  and  stability  properties:  (i)  a1  =  —u2,  u2  =  9iq, 

(ii)  Ui  =  2 u1  —  3 u2,  u2  =  u1  —  u2 ,  (Hi)  u 1  =  3iq  —  2 u2,  u2  =  2 iq  —  2 u2. 


10.3.2.  For  each  of  the  following  systems 


5/2 

2 


u: 
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(a)  Find  the  general  real  solution,  (b)  Using  the  solution  formulas  obtained  in  part  (a), 
plot  several  trajectories  of  each  system.  On  your  graphs,  identify  the  eigenlines  (if  relevant), 
and  the  direction  of  increasing  t  on  the  trajectories,  (c)  Write  down  the  type  and  stability 
properties  of  the  system. 


10.3.3.  Classify  the  following  systems,  and  sketch  their  phase  portraits. 


(a) 


du 

dt 

dv 


=  —  u  +  4  a. 


(b) 


u 


2a. 


du 

dt 

dv 


=  —2  u  +  a, 


(c) 


u 


4a. 


du 

dt 

dv 


dt  dt  dt 

0  10.3.4.  Justify  the  solution  formulas  (10.32)  and  (10.33). 


=  5u  +  4a, 


=  u  -\-  2a. 


(d) 


du 

dt 

dv 

dt 


=  —3  u  —  2a, 


=  3  u  +  2a. 


u —  u ^  3a2, 

10.3.5.  Sketch  the  phase  portrait  for  the  following  systems:  (a) 

a<2  —  3a1  u<2 • 


U1=u1-  4u2,  ul=u1+u2 

(b)  .  (c)  . 


a. 


u 


a 


2- 


a2  —  4a1 


2  a2 . 


U-l  =  +U2, 

(d)  . 


Uc 


a 


2- 


Uj  =  +  §M2, 

(e)  .  5  , 

u2  =  ~lul  +  lu2- 


10.3.6.  Which  of  the  14  possible  two-dimensional  phase  portraits  can  occur  for  the  phase  plane 
equivalent  (10.8)  of  a  second  order  scalar  ordinary  differential  equation? 

10.3.7.  Which  of  the  14  possible  two-dimensional  phase  portraits  can  occur 

(a)  for  a  linear  gradient  flow  (10.19)?  (b)  for  a  linear  Hamiltonian  system  (10.25)? 

du 


10.3.8.  (a)  Solve  the  initial  value  problem 


dt 


-1  2 
-1  -3 


u,  u(0)  = 


1 

3 


(b)  Sketch  a  picture  of  your  solution  curve  u (£),  indicating  the  direction  of  motion. 

(c)  Is  the  origin  (i)  stable?  (ii)  asymptotically  stable?  (in)  unstable?  (iv)  none  of  these? 

Justify  your  answer. 


10.4  Matrix  Exponentials 

So  far,  our  focus  has  been  on  vector- valued  solutions  u(t)  to  homogeneous  linear  systems 
of  ordinary  differential  equations 

=  Aw.  (10.34) 

dt  v  y 

An  evident,  and,  in  fact,  useful,  generalization  is  to  look  for  matrix  solutions.  Specifically, 

we  seek  a  matrix- valued  function  U (t)  that  satisfies  the  corresponding  matrix  differential 

equation 

=  A  U(t).  (10.35) 

As  with  vectors,  we  compute  the  derivative  of  U (t)  by  differentiating  its  individual  entries. 
If  A  is  an  n  x  n  matrix,  compatibility  of  matrix  multiplication  requires  that  U (t)  be  of  size 
n  x  k  for  some  k.  Since  matrix  multiplication  acts  column- wise,  the  individual  columns  of 
the  matrix  solution  U (t)  =  ( u  1(t)  ...  u k(t))  must  solve  the  original  vector  system  (10.34). 
Thus,  a  matrix  solution  is  merely  a  convenient  way  of  collecting  together  several  different 
vector  solutions.  The  most  important  case  is  that  in  which  U (t)  is  a  square  matrix,  of  size 
n  x  n ,  and  so  consists  of  n  vector  solutions  to  the  system. 

Example  10.23.  According  to  Example  10.7,  the  vector-valued  functions 


5 
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are  both  solutions  to  the  linear  system 


du 

dt 


u. 


They  can  be  combined  to  form  the  matrix  solution 


Indeed,  by  direct  calculation 

dU  _  (  —4e~4t  —  e~l  \  _ 

dt  \  8e_4t  -e~l  J 


satisfying 


U. 


U. 


The  existence  and  uniqueness  theorems  are  readily  adapted  to  matrix  differential  equa¬ 
tions,  and  imply  that  there  is  a  unique  matrix  solution  to  the  system  (10.35)  that  has 
initial  conditions 

U(t0)  =  B ,  (10.36) 

where  B  is  an  n  x  k  matrix.  Note  that  the  jth  column  u  •(£)  of  the  matrix  solution  U(t) 
satisfies  the  initial  value  problem 


dt 


UjVo)  = 


where  denotes  the  jth  column  of  B. 

In  the  scalar  case,  the  solution  to  the  particular  initial  value  problem 


du 


is  the  ordinary  exponential  function  u(t)  — 
solution  for  a  more  general  initial  condition 

u(tQ)  —  b  as 


u(  0)  =  1, 

eta.  Knowing  this,  we  can  write  down  the 
u{t)  =  be^~to^a. 


Let  us  formulate  an  analogous  initial  value  problem  for  a  linear  system.  Recall  that,  for 
matrices,  the  role  of  the  multiplicative  unit  1  is  played  by  the  identity  matrix  I .  This 
inspires  the  following  definition. 


Definition  10.24.  Let  A  be  a  square  n  x  n  matrix.  The  matrix  exponential 

U(t)  —  etA  —  exp(tA)  (10.37) 

is  the  unique  n  x  n  matrix  solution  to  the  initial  value  problem 

dll 

— -=AU ,  17(0)  =  I.  (10.38) 

LLL 

In  particular,  one  computes  eA  by  setting  t  —  1  in  the  matrix  exponential  etA.  The 
matrix  exponential  turns  out  to  enjoy  almost  all  the  properties  you  might  expect  from  its 
scalar  counterpart.  Lirst,  it  is  defined  for  all  t  E  R,  and  all  n  x  n  matrices,  both  real  and 
complex.  We  can  rewrite  the  defining  properties  (10.38)  in  the  more  suggestive  form 

-f- etA  =  AetA ,  e0A 

dt 


=  I. 


(10.39) 
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As  in  the  scalar  case,  once  we  know  the  matrix  exponential,  we  are  in  a  position  to  solve 
the  general  initial  value  problem. 

Lemma  10.25.  Let  A  be  an  n  x  n  matrix.  For  any  n  x  k  matrix  B ,  the  solution  to  the 
initial  value  problem 

<B-  =  AU,  U(t0 )  =  B,  is  U(t)  =  b.  (10.40) 


Proof :  Since  B  is  a  constant  matrix, 


dU 

dt 


d 

dt 


e(t-t0)A 


A  e(t~to)  A  B  =  AU , 


where  we  applied  the  chain  rule  for  differentiation  and  the  first  property  (10.39).  Thus, 
U (t)  is  indeed  a  matrix  solution  to  the  system.  Moreover,  by  the  second  property  in  (10.39), 

U( 0)  =  e0AB  =  I  B  =  B 


has  the  correct  initial  conditions. 


Q.E.D. 


Remark.  The  computation  used  in  the  proof  is  a  special  instance  of  the  general  Leibniz 
rule 


d 

dt 


M(t)  N(t) 


dM(t ) 
dt 


N(t)  +  M(t) 


dN(t) 

dt 


(10.41) 


for  the  derivative  of  the  product  of  (compatible)  matrix- valued  functions  M(t)  and  N(t). 
The  reader  is  asked  to  prove  this  formula  in  Exercise  10.4.21. 


In  particular,  the  solution  to  the  original  vector  initial  value  problem 


du 

dt 


U  (t0)  =  b, 


can  be  written  in  terms  of  the  matrix  exponential: 

u (t)  =  A  b. 


(10.42) 


Thus,  the  matrix  exponential  provides  us  with  an  alternative  formula  for  the  solution  of 
autonomous  homogeneous  first  order  linear  systems,  providing  us  with  valuable  new  insight. 

The  next  step  is  to  find  an  algorithm  for  computing  the  matrix  exponential.  The 
solution  formula  (10.40)  gives  a  hint.  Suppose  U(t)  is  any  n  x  n  matrix  solution.  Then, 
by  uniqueness,  U(t)  —  etAU{ 0),  and  hence,  provided  that  U( 0)  is  a  nonsingular  matrix, 

etA  =  U(t)U(0)-\  (10.43) 


since  e0A  =  U( 0)  U ( 0 )  1  =  I,  as  required.  Thus,  to  construct  the  exponential  of  an  n  x  n 
matrix  A,  you  first  need  to  find  a  basis  of  n  linearly  independent  solutions  u1(t), . . . ,  u n(t) 
to  the  linear  system  u  =  4u  using  the  eigenvalues  and  eigenvectors,  or,  in  the  incomplete 
case,  the  Jordan  chains.  The  resulting  n  x  n  matrix  solution  U(t)  =  (u  1(t)  ...  u n(t) )  is 
then  used  to  produce  etA  via  formula  (10.43). 


Example  10.26.  For  the  matrix  A  = 


-2 


considered  in  Example  10.23,  we 


already  constructed  the  nonsingular  matrix  solution  U (t)  = 


,  — 4 1  g—  t 

At  e-t 


2e 


Therefore, 
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by  (10.43),  its  matrix  exponential  is 
etA  =  U{t)U{  0)’1 


e  44 
■  2e-44 


~t 

t 


1  1 
2  1 


\e  4t  +  |  e  1 

| e_4t  +  | 


—  |  e  4t  +  |  e  1 
| e_4t  +  | e~t 


In  particular,  we  obtain  eA  =  exp  A  by  setting  t  —  1  in  this  formula: 


exp 


2  1 
2  -3 


\  e  4  +  |  e  1 


|  e  4  +  |  e  1 


Observe  that  the  matrix  exponential  is  not  obtained  by  exponentiating  the  individual 
matrix  entries. 

To  solve  the  initial  value  problem 


du 

dt 


2  1 
2  -3 


u. 


u(0)  =  b  = 


0 


we  appeal  to  formula  (10.40),  whence 


u(t)  =  = 


\e  4t  +  |  e  t 

| e_4t  +  | 


—  |  e  4t  +  |  e  1 
| e_4t  +  | 


This  reproduces  our  earlier  solution  (10.15) 


Example  10.27.  Suppose  A  — 


-1  -2 

2  -1 


Its  characteristic  equation 


det  (A  -  A I )  =  A2  +  2  A  +  5  =  0 


has  roots 


A  =  —1  ±2i 


which  are  thus  the  eigenvalues.  The  corresponding  eigenvectors  are  v  = 
to  the  complex  conjugate  solutions 


d=  i 
1 


,  leading 


Ml)  = 


ie(— l+2i)t 

g(—  1+2  i )  t 


u2+  = 


i  e(—  !  — 2  i )  t 
e(— l-2i  )t 


We  assemble  them  to  form  the  (complex)  matrix  solution 

_ie(-l-2i)t 
g(—  1  —  2  i )  t 


i  e(—  H-2  i )  t 
U(t)=[  J-l+2i)t 


etA  =  U(t)U(  0)'1  = 


The  corresponding  matrix  exponential  is,  therefore, 

i  p(-l+2i)t  (-l-2i)t\  /  : 

e(-n-2i)t  e(-i-2i  )tjyi  1 

(  e(-l+2i)t  +  e(-l-2i  )t  _e(-l+2i)t  _|_  e(-  1-2  i)  t  ^ 

2l 

e(-l+2i)t  +  e(— 1  — 2i)t 


e(-l+2i)t  _  e(— 1— 2i  )t 

\  2 1 


e  tcos2 1 
e~l  sin  2 1 


e  '  sin  2 1 


Note  that  the  final  expression  for  the  matrix  exponential  is  real,  as  it  must  be,  since  A  is 
a  real  matrix.  (See  Exercise  10.4.19.)  Also  note  that  it  wasn’t  necessary  to  find  the  real 
solutions  to  construct  the  matrix  exponential  —  although  this  would  also  have  worked  and 
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yielded  the  same  result.  Indeed,  the  two  columns  of  etA  form  a  basis  for  the  space  of  (real) 
solutions  to  the  linear  system  u  =  iu. 


Let  us  finish  by  listing  some  further  important  properties  of  the  matrix  exponential,  all 
of  which  are  direct  analogues  of  the  usual  scalar  exponential  function.  Proofs  are  relegated 
to  the  exercises.  First,  the  multiplicative  property  says  that 


e(s+t)  A  _  esA  etA 


for  all  s,  t  £  R. 


(10.44) 


In  particular,  if  we  set  s  =  —  £,  the  left  hand  side  of  (10.44)  reduces  to  the  identity  matrix, 
in  accordance  with  the  second  identity  in  (10.39),  and  hence 

e-tAetA  _  i,  and  hence  e~tA  =  (etA)  1.  (10.45) 

As  a  consequence,  for  any  A  and  any  t  G  1,  the  exponential  etA  is  a  nonsingular  matrix. 
Warning.  In  general, 

et{A+B)  ^  etAetB'  (10.46) 

Indeed,  according  to  Proposition  10.30,  the  left-  and  right-hand  sides  of  (10.46)  are  equal 
for  all  t  if  and  only  if  AB  —  BA  —  that  is,  A  and  B  are  commuting  matrices. 

While  the  matrix  exponential  can  be  painful  to  compute,  there  is  a  simple  formula  for 
its  determinant  in  terms  of  the  trace  of  the  generating  matrix. 


t  A  _  tr  A 


Lemma  10.28.  Let  A  be  a  square  matrix.  Then  detetA  =  e 

Proof :  According  to  Exercise  10.4.26,  if  A  has  eigenvalues  A1? . . . ,  An,  then  etA  has  eigen¬ 
values  etAl,---,  etXn.  Moreover,  using  (8.26),  its  determinant,  detetA,  is  the  product  of 
its  eigenvalues,  and  so 


det  etA  =  etXl  etAs 


gt  Xn  _  ( A1  +  A2+  •••  +An) 


—  e 


t  tr  A 


where,  by  (8.25),  we  identify  the  sum  of  the  eigenvalues  as  the  trace  of  A. 


Q.E.D. 


For  instance,  the  matrix  A  = 


considered  above  in  Example  10.26  has 


-2  1 
2  -3 

tr  A  =  (—2)  +  (—3)  =  —5,  and  hence  detetA  —  e~5t,  as  you  can  easily  check. 
Finally,  we  note  that  the  standard  exponential  series  is  also  valid  for  matrices: 


00 


A  A 


-f-n  +  2  j-3 

=  V  —  An=l+tA+—  A2  +  —  A3  + 

n\  9  6 


n  =  0 


(10.47) 


To  prove  that  the  series  converges,  we  use  the  matrix  norm  convergence  criterion  in  Exercise 
9.2.44(c).  Indeed,  the  corresponding  series  of  matrix  norms  is  bounded  by  the  scalar 
exponential  series, 


00 


A  A 


sE 


n  =  0 


tn 

—  An 
n\ 


OO 


-  E 


t 


n 


00 


n  =  0 


n\ 


A 


n 


S  E 


t 


n 


n  —  0 


n\ 


A 


n 


=  e 1 


A 


which  converges  for  all  £,  [2,  78],  thereby  proving  convergence.  With  this  in  hand,  proving 
that  the  exponential  series  satisfies  the  defining  initial  value  problem  (10.39)  is  straight¬ 
forward: 


d  °°  +n 


00 


-i-ri  —  j.n  —  1  ' — '  fn  ' — '  j.n 

y  v  —  An  =  y  7 — 77  An  =  w  —r  An+i  =  a  v  —  An. 

dt  n\  ^  (n  —  1)!  n\  ^ 

n  =  0  n—  1  v  7  n  —  0 


oo 


OO 


n  =  0 


n\ 
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Moreover,  at  t  =  0,  the  sum  collapses  to  the  identity  matrix:  I  =  e0A.  Thus,  for¬ 
mula  (10.47)  follows  from  the  uniqueness  of  solutions  to  the  matrix  initial  value  problem. 


Exercises 


10.4.1. 


Find  the  exponentials 


of  the  following  2x2  matrices: 


10.4.2.  Determine  the  matrix  exponential  etA  for  the  following  matrices: 


f  0  0  0\ 

(  3  -1  0\ 

f-1  1  1\ 

(0  0  1\ 

(a) 

2  0  1 

,  (0 

-1  2  -1 

,  (c) 

-2  -2  -2  ,  (d) 

1  0  0 

Vo  -l  o J 

^0-1  3 ) 

V  1  -1  -1 J 

^0  1  oj 

10.4.3.  Verify  the  determinant  formula  of  Lemma  10.28  for  the  matrices  in  Exercises  10.4.1 
and  10.4.2. 


10.4.4.  Solve  the  indicated  initial  value  problems  by  first  exponentiating  the  coefficient  matrix 


and  then  applying  formula  (10.42):  (a) 

du 


du 

dt 


C b ) 


dt 


3  -6 

4  -7 


u,  u(0) 


1 

1 


(c) 


10.4.5.  Find  eA  when  A  = 


(a) 


5 

-2 


2 

5 


( b ) 


1 

1 


2 

1 


(c) 


2 

4 


-1 

-2 


0  -1 

1  0 


)u,  u(0)  =  (  _2  ) 


du 

dt 


(  —9 
8 

V-2 


u(0)  = 


( d ) 


(l 

0 

\o 


0 

2 

0 


(e) 


(  0 

-1 

V  2 


1 

0 

-2 


10.4.6.  Let  A  =  1 •  Show  that  eA  =  I. 

10.4.7.  What  is  et(A  where  O  is  the  n  x  n  zero  matrix? 

10.4.8.  Find  all  matrices  A  such  that  etA  =  0. 


0  10.4.9.  Explain  in  detail  why  the  columns  of  etA  form  a  basis  for  the  solution  space  to  the 
system  u  =  4u. 


10.4.10.  Let  A  be  a  2  x  2  matrix  such  that  ti  A  =  0  and  S  =  V det  A  >  0. 

sin  S 

(a)  Prove  that  eA  =  (cos  S)  I  H - —  A.  Hint :  Use  Exercise  8.2.52. 

0 

(b)  Establish  a  similar  formula  when  det  A  <  0.  (c)  What  if  det  A  =  0? 

10.4.11.  Show  that  the  origin  is  an  asymptotically  stable  equilibrium  solution  to  u  =  Aw  if  and 
only  if  lim^^  etA  =  0. 

10.4.12.  Let  A  be  a  real  square  matrix  and  eA  its  exponential.  Under  what  conditions  does  the 
linear  system  u  =  eA  u  have  an  asymptotically  stable  equilibrium  solution? 

10.4.13.  True  or  false:  (a)  eA  =  [eA)~1]  (b)  eAAA  =  eA  eA 

0  10.4.14.  Prove  formula  (10.44).  Hint:  Fix  s  and  prove  that,  as  functions  of  t,  both  sides  of  the 
equation  define  matrix  solutions  with  the  same  initial  conditions.  Then  use  uniqueness. 

10.4.15.  Prove  that  A  commutes  with  its  exponential:  AetA  =  etAA. 


0  10.4.16.  (a)  Prove  that  the  exponential  of  the  transpose  of  a  matrix  is  the  transpose  of  its 

i  A  T  i  A  rri 

exponential:  e  =  (e  )  .(b)  What  does  this  imply  about  the  solutions  to  the  linear 
systems  u  =  1  u  and  v  =  Ttv? 
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0  10.4.17.  Prove  that  if  A  =  SBS  1  are  similar  matrices,  then  so  are  etA  =  SetB  S  1. 


10.4.18.  Prove  that  et(yA  A  ^  =  e  tX  etA  by  showing  that  both  sides  are  matrix  solutions  to 
the  same  initial  value  problem. 

0  10.4.19.  Let  A  be  a  real  matrix,  (a)  Explain  why  eA  is  a  real  matrix,  (b)  Prove  that  det  eA>0. 

10.4.20.  Show  that  tr  A  =  0  if  and  only  if  detetA  =  1  for  all  t. 

0  10.4.21.  Justify  the  matrix  Leibniz  rule  (10.41)  using  the  formula  for  matrix  multiplication. 
10.4.22.  Prove  that  if  U(t)  is  any  matrix  solution  to  =  AU ,  then  so  is  U(t)  =  U(t)  C , 


dt 

where  C  is  any  constant  matrix  (of  compatible  size). 


0  10.4.23.  Prove  that  if  A  =  is  a  block  diagonal  matrix,  then  so  is  etA  =  ^ 

0  10.4.24.  (a)  Prove  that  if  J0  n  is  an  n  x  n  Jordan  block  matrix  with  0  diagonal  entries. 


etB  O 
^  AC 


( 


cf.  (8.49),  then  e 


t  Ji 


0  ,n  . 


t2  t 3 

It  —  — 


t 


n 


\ 


0  1 


0  0 


2 

t 


t 


6 

2 


t 


n ! 

n  —  1 


2 

t 


(n  —  1) ! 

in  — 2 


V 


0  0 
VO  0 


0 

0 


1 

0 


(n  -  2) ! 


t 

1 


/ 


(b)  Determine  the  exponential  of  a  general  Jordan  block  matrix  J \  n.  Hint :  Use 

Exercise  10.4.18.  (c)  Explain  how  you  can  use  the  Jordan  canonical  form  to  compute 
the  exponential  of  a  matrix.  Hint :  Use  Exercise  10.4.23. 

0  10.4.25.  Diagonalization  provides  an  alternative  method  for  computing  the  exponential  of  a 
complete  matrix,  (a)  First  show  that  if  D  =  diag  (d1? . . . ,  dn)  is  a  diagonal  matrix,  so  is 
etD  =  diag  (e* dl , . . . ,  et  dn ).  (b)  Second,  using  Exercise  10.4.17,  prove  that  if  A  =  S  D  S_1 
is  diagonalizable,  so  is  etA  =  S  etD  S'-1 .  (c)  When  possible,  use  diagonalization  to  compute 
the  exponentials  of  the  matrices  in  Exercises  10.4.1-2. 

0  10.4.26.  (a)  Prove  that  if  A  is  an  eigenvalue  of  A,  then  etX  is  an  eigenvalue  of  etA.  What  is  the 
eigenvector?  (b)  Show  that  the  eigenvalues  have  the  same  multiplicities. 

Hint :  Combine  the  Jordan  canonical  form  (8.51)  with  Exercises  10.4.24  and  10.4.25. 

0  10.4.27.  Let  A  be  a  symmetric  matrix  with  Spectral  Decomposition 

A  =  \1P1+ \2P2+  ...  +A  kPk, 

as  in  (8.37).  Prove  that 

t  n  I  I  _£  A k 


etA  =  etXl  P1  +  el  A2  P2  + 


+  e 


Pk- 


■ 

0  10.4.28.  (a)  Show  that  U(t)  satisfies  the  matrix  differential  equation  U  =  U B  if  and  only  if 

U(t)  =  CetB ,  where  C  =  U( 0).  (b)  If  U( 0)  is  nonsingular,  then  U(t)  also  satisfies  a  matrix 

■ 

differential  equation  of  the  form  U  =  AU.  Is  A  =  B ?  Hint :  Use  Exercise  10.4.17. 

10.4.29.  True  or  false:  The  solution  to  the  non-autonomous  initial  value  problem 

u  =  Aft)  u,  u(0)  =  b,  is  u (t)  =  exp  ^  A(s)  ds  ^  b. 
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C  10.4.30.  (a)  Suppose  u1(t), . . . ,  u n(t)  are  vector-valued  functions  whose  values  at  each  point  t 

are  linearly  independent  vectors  in  IRn.  Show  that  they  form  a  basis  for  the  solution  space 
of  a  homogeneous  constant  coefficient  linear  system  u  =  Aw  if  and  only  if  each  duj/dt 

is  a  linear  combination  of  u1(t), . . . ,  u n(t).  Hint :  Use  Exercise  10.4.28.  (b)  Show  that  a 
function  u (t)  belongs  to  the  solution  space  of  a  homogeneous  constant  coefficient  linear 


dn  w 


system  u  =  Au  if  and  only  if 
Hint:  Use  Exercise  10.1.7. 


dt 


n 


is  a  linear  combination  of  u. 


du 

dt 


d^u 


dt 


n 


1 


(s ?  10.4.31.  By  a  (natural)  logarithm  of  a  matrix  B  we  mean  a  matrix  A  such  that  eA  =  B. 

(a)  Explain  why  only  nonsingular  matrices  can  have  a  logarithm. 

(b)  Comparing  Exercises  10.4.6-7,  explain  why  the  matrix  logarithm  is  not  unique. 

(c)  Find  all  real  logarithms  of  the  2x2  identity  matrix  I  = 

Hint :  Use  Exercise  10.4.26. 


Applications  in  Geometry 


Matrix  exponentials  are  an  effective  tool  for  understanding  the  linear  transformations  that 
appear  in  geometry  and  group  theory,  [93],  quantum  mechanics,  [54],  computer  graphics 
and  animation,  [5, 12,  72],  computer  vision,  [73],  and  the  symmetry  analysis  of  differential 
equations,  [13,  60].  We  will  only  be  able  to  scratch  the  surface  of  this  important  and  active 
area  of  contemporary  mathematical  research. 

Let  A  be  an  n  x  n  matrix.  For  each  t  £  R,  the  corresponding  exponential  etA  is  itself 
an  n  x  n  matrix  and  thus  defines  a  linear  transformation  on  the  vector  space  Mn: 


for  x  <E  Mn . 


In  this  manner,  each  square  matrix  A  generates  a  family  of  invertible  linear  transformations, 
parameterized  by  t  E  R.  The  resulting  linear  transformations  are  not  arbitrary,  but  are 
subject  the  following  three  rules: 

Lt°Ls  =  Lt+S  =  Ls°Lt,  Lq  —  I ,  L_t  —  Lt1.  (10.48) 


These  are  merely  restatements  of  three  of  the  basic  matrix  exponential  properties  listed 
in  (10.39,44,45).  In  particular,  every  transformation  in  the  family  commutes  with  every 
other  one. 

In  geometry,  the  family  of  transformations  Lt  =  etA  is  said  to  form  a  one-parameter 
groupf  [60],  with  t  the  parameter,  and  the  matrix  A  is  referred  to  as  its  infinitesimal 
generator.  Indeed,  by  the  series  formula  (10.39)  for  the  matrix  exponential, 


L 


t 


—  etA  x  =  (  I  +  £  A  +  |  t2A2  +  •••  )  x  =  x  +  tdx  +  ^  t2  A2  x  +  •  •  •  . 


(10.49) 


When  t  is  small,  we  can  truncate  the  exponential  series  and  approximate  the  transformation 
by  the  linear  function 

Fjx]  =  ( I  +  tA)  x  =  x  +  1 4x  (10.50) 

defined  by  the  infinitesimal  generator.  We  already  made  use  of  such  approximations  when 
we  discussed  the  rigid  motions  and  mechanisms  of  structures  in  Chapter  6.  As  t  varies,  the 


See  also  Exercise  4.3.24  for  the  general  definition  of  a  group. 
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group  transformations  (10.49)  typically  move  a  point  x  along  a  curved  trajectory.  Under 
the  first  order  approximation  (10.50),  the  point  x  moves  along  a  straight  line  in  the  direction 
b  =  Ax  —  the  tangent  line  to  the  curved  trajectory.  Thus,  the  infinitesimal  generator  of 
a  one-parameter  group  prescribes  the  tangent  line  approximation  to  the  nonlinear  motion 
prescribed  by  the  group  transformations . 


Most  of  the  linear  transformations  of  interest  in  the  above-mentioned  applications  arise 
in  this  fashion.  Let’s  look  briefly  at  a  few  basic  examples. 


0 


t 


(a)  When  A  —  (  ^  ^  j  ,  then  etA  —  (  q  ^ 


represents  a  shearing  transformation.  The 


group  laws  (10.48)  imply  that  the  composition  of  a  shear  of  magnitude  s  and  a  shear 
of  magnitude  t  in  the  same  direction  is  another  shear  of  magnitude  s  +  t. 

t 

o  i  j  ’ . .  v  0  e‘ 

Composition  and  inverses  of  such  scaling  transformations  are  also  scalings. 


(b)  When  A  =  (  ^  ?  )  ,  thenetA  =  (  6  ^  )  represents  a  uniform  scaling  transformation. 


(c)  When  A  = 


1 


0 


,  then  etA  — 


o  -i  j  ’  v  0  e 

in  the  x  direction  and  a  contraction  in  the  y  direction 


) ,  which,  for  t  >  0,  represents  a  stretch 


(d)  When  A  = 


is  the  matrix  for  a  plane  rotation, 


0  “M  then  etA  =  I  cos^  _sint 
10  J  ’  y  sin  t  cos  t 

around  the  origin,  by  angle  t.  The  group  laws  (10.48)  say  that  the  composition  of  a 
rotation  through  angle  s  followed  by  a  rotation  through  angle  t  is  a  rotation  through 
angle  s+t,  as  previously  noted  in  Example  7.12.  Also,  the  inverse  of  a  rotation  through 
angle  £  is  a  rotation  through  angle  —t. 


Observe  that  the  infinitesimal  generator  of  this  one-parameter  group  of  plane  rotations  is 
a  2  x  2  skew- symmetric  matrix.  This  turns  out  to  be  a  general  fact:  rotations  in  higher 
dimensions  are  also  generated  by  skew-symmetric  matrices. 

Lemma  10.29.  If  AT  =  —  A  is  a  skew-symmetric  matrix,  then,  for  all  Ul,  its  matrix 
exponential  Q(t)  —  etA  is  a  proper  orthogonal  matrix. 


Proof :  According  to  equation  (10.45)  and  Exercise  10.4.16, 

Q(t)~l  =  e-tA  =  etAT  =  (etA)T  =  Q(t)T , 

which  proves  orthogonality.  Properness,  det  Q  —  +1,  follows  from  Lemma  10.28  using  the 
fact  that  tr  A  =  0,  since  all  the  diagonal  entries  of  a  skew-symmetric  matrix  are  0.  Q.E.D. 


With  some  more  work,  it  can  be  shown  that  every  proper  orthogonal  matrix  is  the  ex¬ 
ponential  of  some  skew-symmetric  matrix,  albeit  not  a  unique  one.  Thus,  the  ^n(n  —  1)- 
dimensional  vector  space  of  n  x  n  skew-symmetric  matrices  generates  the  group  of  rota¬ 
tions  in  n-dimensional  Euclidean  space.  In  the  three-dimensional  case,  the  three  matrices 
AxlAy ,  Az  listed  below  form  a  basis  and  serve  to  generate,  respectively,  the  one-parameter 
groups  of  counterclockwise  rotations  around  the  x-:  y-:  and  x-axes: 
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/  0  0 

0  0 

\°  X 
/  0  0 
0  0 
\-l  0 

0  -1 
1  0 
0  0 


/l  0  0 

0  cos  t  —  sin  t 

y  0  sin  t  cos  t 

cos  t  0  sin  t  \ 

0  10, 
—  sin  t  0  cos  t  J 


(cos  t 
sin  t 
0 


—  sin  t 
cos  t 
0 


(10.51) 


Since  every  other  skew-symmetric  matrix  can  be  expressed  as  a  linear  combination  of 
Ax,Ay ,  and  Az,  every  rotation  can,  in  a  sense,  be  generated  by  these  three  basic  types. 
This  reconfirms  our  earlier  observations  concerning  the  number  of  rigid  motions  (rotations 
and  translations)  experienced  by  an  unattached  structure;  see  Section  6.3  for  details. 

In  the  three-dimensional  case,  it  can  be  shown  that  every  non-zero  skew-symmetric 
3x3  matrix  A  is  singular,  with  one-dimensional  kernel.  Let  0  /  v  G  ker  A  be  the  null 
eigenvector.  Then  the  matrix  exponentials  etA  form  the  one-parameter  group  of  rotations 
around  the  axis  defined  by  v.  For  instance,  referring  to  (10.51),  ker  Ax  is  spanned  by 

e:  =  ( 1,  0,  0 )  ,  reconfirming  that  it  generates  the  rotations  around  the  x-axis.  Details  can 
be  found  in  Exercise  10.4.38. 

Noncommutativity  of  linear  transformations  is  reflected  in  the  noncommutativity  of 
their  infinitesimal  generators.  Recall,  (1.12),  that  the  commutator  of  two  n  x  n  matrices 
A ,  B  is 


[A,B]  =  AB- BA. 


(10.52) 


Thus,  A  and  B  commute  if  and  only  if  [  A,  B  ]  =  O.  We  use  the  exponential  series  (10.47) 
to  evaluate  the  commutator  of  the  corresponding  matrix  exponentials: 


etA,etB 


etAetB 


etBetA 


(I  +  tA+\t2A2  +  •••  )(I  +  tB  +  \t2B2  +  ...  )  - 
-  (  I  +tB+\t2B2  +  ) (  I  +tA+  \t2A2  +  ... 

t2(AB  -  BA)  +  =t2[A,B}+  ■■■  . 


) 


(10.53) 


In  particular,  if  the  groups  commute,  then  [  A,  B  ]  =  0.  The  converse  is  also  true,  since  if 
AB  =  B A  then  all  terms  in  the  two  series  commute,  and  hence  the  matrix  exponentials 
also  commute. 


Proposition  10.30.  The  matrix  exponentials  etA  and  etB  commute  for  all  t  if  and  only 
if  the  matrices  A  and  B  commute: 

etAetB  =  etBetA  =  et(A+B)  provided  AB  =  BA.  (10.54) 


In  particular,  the  non-commutativity  of  three-dimensional  rotations  follows  from  the 
non-commutativity  of  their  infinitesimal  skew-symmetric  generators.  For  instance,  the 
commutator  of  the  generators  of  rotations  around  the  x-  and  y- axes  is  the  generator  of 

rotations  around  the  z- axis:  T  7L,  d 

^  y  ~ 


=  Az ,  since 


/0  0  0\  /  0  0  1\  /  0  0  1  \  / 0  0  0\  /0  -1  0\ 

0  0  -1  0  0  0  -  0  0  00  0  -1  =  1  0  0  . 

\o  i  o/  \-i  o  o/  \-i  o  o/  yo  i  o /  yo  o  o) 
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Hence,  to  a  first  order  (or,  more  correctly,  second  order)  approximation,  the  difference 
between  x  and  y  rotations  is,  interestingly,  a  z  rotation. 


Exercises 


10.4.32.  Find  the  one-parameter  groups  generated  by  the  following  matrices  and  interpret 
geometrically:  What  are  the  trajectories?  What  are  the  fixed  points? 

o  o)’  (fa)  (?  o)’  w  (-3  o)’  ^  ■;)’  (*)  (;  j 


(a) 


10.4.33.  Write  down  the  one-parameter  groups  generated  by  the  following  matrices  and 
interpret.  What  are  the  trajectories?  What  are  the  fixed  points? 


(2  0  0\ 

(0  0  n 

(0  0  —2 \ 

(  0  i  °\ 

f  0  0  i\ 

(a) 

0  1  0 

,  (b) 

000 

,  (c) 

0  0  0 

.  (d) 

-10  0 

,  (e) 

0  0  0 

(0  0  oj 

(0  0  oj 

^2  0  0) 

(00!/ 

P  0  0) 

10.4.34.  (a)  Find  the  one-parameter  group  of  rotations  generated  by  the  skew-symmetric  matrix 

/  0  1  1 


A  = 


-1 

v-i 


0  -1 

1  0 


(b)  As  noted  above,  etA  represents  a  family  of  rotations  around  a 


fixed  axis  in 


What  is  the  axis? 


10.4.35.  Choose  two  of  the  groups  in  Exercise  10.4.32  or  10.4.33,  and  determine  whether  or  not 
they  commute  by  looking  at  their  infinitesimal  generators.  Then  verify  your  conclusion  by 
directly  computing  the  commutator  of  the  corresponding  matrix  exponentials. 

10.4.36.  (a)  Prove  that  the  commutator  of  two  upper  triangular  matrices  is  upper  triangular. 

(b)  Prove  that  the  commutator  of  two  skew-symmetric  matrices  is  skew  symmetric. 

(c)  Is  the  commutator  of  two  symmetric-matrices  symmetric? 


0  10.4.37.  Prove  that  the  Jacobi  identity 

[[A,B],C]  +  [[C,A],B]  +  [[B,C],A]  =  0 
is  valid  for  three  n  x  n  matrices  A,  B,  C. 


(10.55) 


T  10.4.38.  Let  0  7^  v  £  IR  .  (a)  Show  that  the  cross  product  Lv[x]  =  v  x  x  defines  a 

Q 

linear  transformation  onl  .  (b)  Find  the  3x3  matrix  representative  Av  of  Lv  and 
show  that  it  is  skew-symmetric,  (c)  Show  that  every  non-zero  skew-symmetric  3x3 
matrix  defines  such  a  cross  product  map.  (d)  Show  that  ker  Av  is  spanned  by  v. 

(e)  Justify  the  fact  that  the  matrix  exponentials  etA v  are  rotations  around  the 
axis  v.  Thus,  the  cross  product  with  a  vector  serves  as  the  infinitesimal  generator  of 
the  one-parameter  group  of  rotations  around  v. 

3 


T  10.4.39.  Given  a  unit  vector 


u 


=  1  in 


let  A  =  Au  be  the  corresponding  skew-symmetric 


V  10.4.40.  Let  A  = 


3x3  matrix  that  satisfies  Ax  =  uxx,  as  in  Exercise  10.4.38.  (a)  Prove  the  Euler- Rodrigues 
formula  etA  =  I  +  (sin£)A  +  (1  —  cost)  A2.  Hint :  Use  the  matrix  exponential  series  (10.47). 
(b)  Show  that  etA  =  I  if  and  only  if  t  is  an  integer  multiple  of  27t.  (c)  Generalize  parts 
(a)  and  (b)  to  a  non-unit  vector  v/0. 

( 0  -1  0\  /0\ 

1  0  0  ,  b  =  0  .  (a)  Show  that  the  solution  to  the  linear  system 

Vo  0  0 )  \lj 

o 

x  =  Ax  represents  a  rotation  of  IR  around  the  z- axis.  What  is  the  trajectory  of  a  point 
x0?  (b)  Show  that  the  solution  to  the  inhomogeneous  system  x  =  Ax  +  b  represents  a 
screw  motion  of  IR  around  the  z- axis.  What  is  the  trajectory  of  a  point  xQ?  (c)  More 
generally,  given  0/aG  IR  ,  show  that  the  solution  tox  =  axx  +  a  represents  a  family  of 
screw  motions  along  the  axis  a. 
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10.4.41.  Let  A  be  an  n  x  n  matrix  whose  last  row  has  all  zero  entries.  Prove  that  the  last  row 
of  etA  is  =  ( 0, . . . ,  0, 1 ). 


10.4.42.  Let  A  =  ^  be  in  block  form,  where  B  is  an  n  x  n  matrix,  c  £  Mn,  while  0 

denotes  the  zero  row  vector  with  n  entries.  Show  that  its  matrix  exponential  is  also  in 

block  form  etA  =  (  e 

V  0  i 

0  10.4.43.  According  to  Exercise  7.3.10,  an  (n  +  1)  x  (n  +  1)  matrix  of  the  block  form 

in  which  A  is  an  n  x  n  matrix  and  b  £  IRn  can  be  identified  with  the  affine  transformation 
F[x]  =  Ax  +  b  on  IRn.  Exercise  10.4.42  shows  that  every  matrix  in  the  one-parameter 

A  b 
0  0 

a  family  of  affine  maps  on  IRn.  Describe  the  affine  transformations  of  IR2  generated  by  the 
following  matrices: 


(0 

0 

1\ 

(i 

0 

°\ 

(0 

-1 

0\ 

(1 

0 

1\ 

(a) 

0 

0 

0 

.  O) 

0 

-2 

0  , 

(c) 

1 

0 

1 

»  (d) 

0 

-1 

-2 

V0 

0 

0  J 

Vo 

0 

0  ) 

^0 

0 

0  J 

^0 

0 

0/ 

has  such  a  form,  and  hence  we  can  identify  etB  as 


t  R 

group  e  generated  by  B  = 


^ .  Can  you  find  a  formula  for  f(£)? 


Invariant  Subspaces  and  Linear  Dynamical  Systems 

Invariant  subspaces,  as  per  Definition  8.27,  play  an  important  role  in  the  study  of  ho¬ 
mogeneous  linear  dynamical  systems.  In  general,  a  subset  S  C  Mn  is  called  invariant 
for  the  homogeneous  linear  dynamical  system  u  =  Au  if,  whenever  the  initial  condition 
u(t0)  =  b  €  S,  then  the  solution  u (t)  £  S  for  all  t  £  IR.. 

Proposition  10.31.  If  V  C  IRn  is  an  invariant  subspace  of  the  matrix  A,  then  it  is 
invariant  under  the  corresponding  homogeneous  linear  dynamical  system. 

Proof :  Given  that  b  £  V,  we  have  Ab  £  V,  A2b  £  V,  and,  in  general,  Anb  £  V  for  each 
n  >  0.  Thus  every  term  in  the  matrix  exponential  series  for  the  solution  (10.42),  namely 

00  j-n 

u (t)  =  e(t-to)  A  b  =  V  —  Anb, 

^  n! 

n  —  0 

belongs  to  V  and  hence,  because  V  is  closed,  so  does  their  sum:  u (t)  £  V.  Q.E.D. 

As  we  know,  the  (complex)  invariant  subspaces  of  a  complete  matrix  are  spanned  by 
its  (complex)  eigenvectors.  According  to  the  general  Stability  Theorem  10.13,  these  come 
in  three  flavors,  depending  upon  whether  the  real  part  of  the  corresponding  eigenvalue 
is  less  than,  equal  to,  or  greater  than  0.  The  first  kind,  with  Re  A  <  o,  correspond  to 
the  asymptotically  stable  eigensolntions  u (t)  =  eAtv  0  as  t  -A  oc.  The  second  kind, 
with  zero  real  part,  correspond  to  stable  eigensolutions  that  remain  bounded  for  all  t, 
by  completeness.  The  third  kind,  with  Re  A  >  0,  correspond  to  unstable  eigensolutions 
that  become  unbounded  at  an  exponential  rate  as  t  -A  oo.  A  similar  result  holds  for  the 
corresponding  real  solutions  of  a  complete  real  matrix.  If  the  matrix  is  incomplete,  then 
the  solutions  corresponding  to  Jordan  chains  with  eigenvalues  having  negative  real  part  are 
also  asymptotically  stable;  those  corresponding  to  Jordan  chains  with  eigenvalues  having 
positive  real  part  remain  exponentially  unstable.  If  any  purely  imaginary  eigenvalue  is 
incomplete,  then  the  polynomial  factor  in  front  of  the  corresponding  Jordan  chain  solution 
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makes  it  unstable,  becoming  unbounded  at  a  polynomial  rate.  An  example  of  the  latter 
behavior  is  provided  by  a  planar  system  that  has  0  as  its  incomplete  eigenvalue,  producing 
unstable  linear  motion.  The  minimum  dimension  of  a  (real)  system  possessing  a  non-zero, 
incomplete  purely  imaginary  eigenvalue  is  4. 

This  motivates  dissecting  the  underlying  vector  space  into  three  invariant  subspaces, 
having  only  the  zero  vector  in  common,  that  capture  the  three  possible  modes  of  behavior. 
We  state  the  definition  in  the  real  case,  leaving  the  simpler  complex  version  to  the  reader. 

Definition  10.32.  Let  A  be  a  real  n  x  n  matrix.  We  define  the  following  invariant  sub¬ 
spaces  spanned  by  the  real  and  imaginary  parts  of  the  eigenvectors  and  Jordan  chains 
corresponding  to  the  eigenvalues  with  the  following  properties: 

(i)  negative  real  part:  the  stable  subspace  S  C  Mn; 

(ii)  zero  real  part:  the  center  subspace  C  C  Mn; 

(Hi)  positive  real  part:  the  unstable  subspace  U  C  Mn. 


If  there  are  no  eigenvalues  of  the  specified  type,  the  corresponding  invariant  subspace 
is  trivial.  For  example,  if  the  associated  linear  system  has  asymptotically  stable  zero 
solution,  then  S  —  Mn  while  C  —  U  —  {0}.  The  stable,  unstable,  and  center  subspaces 
are  complementary,  as  in  Exercise  2.2.24,  in  the  sense  that  their  pairwise  intersections  are 
trivial:  S  H  C  =  S  H  U  —  C  D  U  =  {0},  and  their  sum  S  +  C  +  U  =  Mn,  in  the  sense  that 
every  v  E  Mn  can  be,  in  fact  uniquely,  written  as  a  sum  v  =  s  +  c  +  uof  vectors  in  each 
subspace:  s  G  S,  c  G  6,  u  G  [/. 

Since  each  of  these  subspaces  is  invariant,  if  the  initial  condition  belongs  to  one  of  them, 
so  does  the  corresponding  solution.  In  view  of  the  solution  formulas  in  Theorem  10.13,  we 
deduce  the  following  more  intrinsic  characterizations,  in  terms  of  the  asymptotic  behavior 
of  the  solutions  to  the  homogeneous  linear  dynamical  system. 


Theorem  10.33.  Let  A  be  an  n  x  n  matrix.  Let  0  ^  b  G  Mn,  and  let  u (t)  be  a  solution 
to  the  associated  initial  value  problem  u  =  4u,  u(£0)  =  b.  Then  b  and  hence  u (t)  are  in: 
(i)  the  stable  subspace  S  if  and  only  if  u (t)  -D  0  as  t  -D  oo,  or,  alternatively,  ||  u(t)  ||  -4-  oo 
at  an  exponential  rate  as  t  —  oo; 


(ii)  the  center  subspace  C  if  and  only  if  u (t)  is  bounded  or  ||  u(t)  \ 
rate  as  t  -D  Too; 


oo  at  a  polynomial 


(Hi)  the  unstable  subspace  U  if  and  only  if  ||  u (t)  \ 
or,  alternatively,  u (t)  -D  0  as  t  -D  —  oo. 


oo  at  an  exponential  rate  as  t  oo. 


Example  10.34.  For  example,  the  matrix  A 
eigenvectors 


-2  1  0\ 

1—1  1  has  eigenvalues  and 

0  1  —2  / 


Aj  =  0, 


Thus,  the  stable  subspace  is  the  plane  spanned  by  v2  and  v3,  whose  nonzero  solutions  tend 
to  the  origin  as  t  oo  at  an  exponential  rate;  the  center  subspace  is  the  line  spanned  by 
vl5  all  of  whose  solutions  are  constant;  the  unstable  subspace  is  trivial:  U  —  {0}.  So  the 
origin  is  a  stable,  but  not  asymptotically  stable,  equilibrium  point. 


The  Center  Manifold  Theorem,  a  celebrated  result  in  nonlinear  dynamics,  [34],  states 
that  the  above  formulated  linear  splitting  into  stable,  center,  and  unstable  regimes  carries 
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over  to  nonlinear  systems  in  a  neighborhood  of  an  equilibrium  point.  Roughly  speaking, 
suppose  that  u0  is  an  equilibrium  point  of  the  nonlinear  systems  of  ordinary  differential 
equations  u  =  f(u),  so  that  f(u0)  =  0.  Let  A  be  the  linearization  of  f(u)  at  u0,  meaning 

its  Jacobian  matrix,  so  A  =  f'(u0)  =  (dfi/duj):  evaluated  at  the  equilibrium  point.  Then, 
in  a  neighborhood  of  u0,  the  dynamical  system  admits  three  invariant  curved  manifolds, 
meaning  curves,  surfaces,  and  their  higher-dimensional  counterparts,  called  the  stable , 
center ,  and  unstable  manifolds  that  are  tangent  to  (or  equivalently,  approximated  by)  the 
corresponding  invariant  subspaces  of  its  linearization  matrix  A.  Solutions  evolving  on  the 
stable  and  unstable  manifolds  exhibit  behaviors  similar  to  those  of  the  linear  system.  In 
particular,  solutions  on  the  stable  manifold  converge  to  the  equilibrium,  u(£0)  — >>  u0  as 
£  — oo,  at  an  exponential  rate  governed  by  the  corresponding  eigenvalues  of  A,  while  those 
on  the  unstable  manifold  move  away  from  the  equilibrium  point  u0  —  although  one  cannot 
say  what  happens  to  them  once  they  exit  the  neighborhood,  once  the  nonlinear  effects  take 
over.  Solutions  on  the  center  manifold  have  more  subtle  dynamical  behavior,  that  depends 
on  the  detailed  structure  of  the  nonlinear  terms.  In  this  manner,  one  can  effectively  argue 
that,  near  a  fixed  point,  all  the  interesting  dynamics  takes  place  on  the  center  manifold. 


Exercises 

10.4.44.  (a)  Given  a  homogeneous  linear  dynamical  system  with  invariant  stable,  unstable, 
and  center  subspaces  S,U,C,  explain  why  the  origin  is  asymptotically  stable  if  and  only  if 

C  =  U  =  {0}.  (b)  Is  the  origin  stable  if  U  =  {0}  but  C  ^  {0}? 

10.4.45.  Find  the  (real)  stable,  unstable,  and  center  subspaces  of  the  following  linear  systems: 

iq  =  z2, 


(a) 


aq  =  9u2, 


u 


2  —  un 
U-i  —  U-i  —  3u<2  +  lluo, 


(b) 


x1  =  4x1  +  x2 


-  Q  ry»  • 

tb  ^  e)  tAy  "j^  ^ 


(C) 


Vi  =Vi  -y2i 
y2  =  2y1  +  3  y2; 


(d)  Zn  —  3z-i  T  2 


1  —  ui  ■>  u'2  I  u'3 
(e)  u2  =  2 u1  —  6u2  +  1 6a3, 

=  u1  -  3u2  +  7u3, 


(0 


du 

dt 


(-1  3  -3\ 

2  2-7 

\  0  3  -4/ 


u. 


(g) 


2/3 : 

du 

dt 


1 

^2  5 

/0  0 
0  0 
1 

VO 


z 


3’ 


1  o\ 
0  2 
0  0  0 
2  0  0/ 


u. 


0  10.4.46.  State  and  prove  a  counterpart  to  Definition  10.32  and  Theorem  10.33  for  a 
homogeneous  linear  iterative  system. 


Inhomogeneous  Linear  Systems 

We  now  direct  our  attention  to  inhomogeneous  linear  systems  of  ordinary  differential  equa¬ 
tions.  For  simplicity,  we  consider  only  first  order^  systems  of  the  form 

/j  11 

^=du  +  f(f),  (10.56) 

dt 

in  which  A  is  a  constant  n  x  n  matrix  and  f  (t)  is  a  vector- valued  function  of  t  that  can  be 
interpreted  as  a  collection  of  time- varying  external  forces  acting  on  the  system.  According 
to  Theorem  7.38,  the  solution  to  the  inhomogeneous  system  will  have  the  general  form 

u(t)  =  u  *(t)  +  z  (t) 


^  Higher  order  systems  can,  as  in  the  phase  plane  construction,  (10.8),  always  be  converted  into 
first  order  systems  involving  additional  variables. 
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where  u *(t)  is  a  particular  solution,  representing  a  response  to  the  forcing,  while  z  (t)  is 
a  solution  to  the  corresponding  homogeneous  system  z  =  iz,  representing  the  system’s 
internal  motion.  Since  we  now  know  how  to  find  the  solution  z (t)  to  the  homogeneous 
system,  the  only  task  is  to  find  one  particular  solution  to  the  inhomogeneous  system. 

In  your  first  course  on  ordinary  differential  equations,  you  probably  encountered  a 
method  known  as  variation  of  parameters  for  constructing  particular  solutions  of  inho¬ 
mogeneous  scalar  ordinary  differential  equations,  [7].  The  method  can  be  readily  adapted 
to  first  order  systems.  Recall  that,  in  the  scalar  case,  to  solve  the  inhomogeneous  equation 

=  au-\-f(t),  we  set  u(t)  —  etav(t), 


dt 

where  the  function  v(t)  is  to  be  determined.  Differentiating,  we  obtain 

dv 


(10.57) 


dU  +n  /  \  t n  dv 

=  aetav{t)  +  eta 


=  au  +  e 


t  a 


dt  dt  dt 

Therefore,  u(t)  satisfies  the  differential  equation  (10.57)  if  and  only  if 

dv 


dt 


—  e 


—  ta 


fit) 


Since  the  right-hand  side  of  the  latter  is  known,  v(t)  can  be  immediately  found  by  a  direct 
integration. 

The  method  can  be  extended  to  the  vector-valued  situation  as  follows.  We  replace  the 
scalar  exponential  by  the  exponential  of  the  coefficient  matrix,  setting 

u(t)  =  el  A  v(£),  (10.58) 

where  v(£)  is  a  vector- valued  function  that  is  to  be  determined.  Combining  the  product 
rule  for  matrix  multiplication  (10.41)  with  (10.39)  yields 


du  d  ,  ,  A  x  detA 
—  =  —  (etA  v)  =  —  v  +  e 
dt  dt  v 


fA  dv  fA  fA  dv  tA  dv 

tA  =  AetAv  +  etA  —  =  Au  +  etA 


dt  dt  dt 

Comparing  with  the  differential  equation  (10.56),  we  conclude  that 

dv 


dt 


dt 


=  e 


—  tA 


f  it). 


Integrating^"  both  sides  from  the  initial  time  tQ  to  time  t  produces,  by  the  Fundamental 
Theorem  of  Calculus, 


v(t)  =  v(t0)  +  f  e  sAf(s)ds,  where  v(t0)  =  e  toA  u (t0) 

Jt0 


(10.59) 


Substituting  back  into  (10.58)  leads  to  a  general  formula  for  the  solution  to  the  inhomoge¬ 
neous  linear  system. 

Theorem  10.35.  The  solution  to  the  initial  value  problem 


du 

dt 


=  Au  +  f  (t),  u(t0)  =  b, 


is 


u(£)  =  to^Ab  -\-  f  e^  s^Af(s)ds.  (10.60) 

Jto 


As  with  differentiation,  vector-valued  and  matrix-valued  functions  are  integrated  entry-wise. 


10.4  Matrix  Exponentials 


607 


In  the  solution  formula,  the  integral  term  can  be  viewed  as  a  particular  solution  u *(t), 
namely  the  one  satisfying  the  initial  condition  u *(t0)  =  0,  while  the  first  summand, 
z (t)  =  e^~to^Ab  for  b  E  Mn,  constitutes  the  general  solution  to  the  homogeneous 
system. 


(10.61) 


Example  10.36.  Our  goal  is  to  solve  the  initial  value  problem 

it1  —  2  ux  —  u2l  ^i(0)  —  1? 

ii2  —  Au1  —  3u2  +  et:  ^2(0)  =  0- 

The  hrst  step  is  to  determine  the  eigenvalues  and  eigenvectors  of  the  coefficient  matrix 
2  -1\  ,  .  /l\  ,  .  (\ 


A  = 


4  -3 


so 


=  1, 


vi  = 


1 


A2  — 


v2  = 


The  resulting  eigensolutions  form  the  columns  of  the  nonsingular  matrix  solution 


U{t)  = 


2 1 


hence  etA  =  U(t)U( 0)  i  = 


e  e 
et  4e_2t 

Since  t0  =  0,  the  two  constituents  of  the  solution  formula  (10.60)  are 

2 1  ^ 


2 1 


ie 

3  e 

4.-2 1 
3  e 


\et  +  21 

|  ef  +  |  e~2t 


jtA 


b  = 


4  t 
3  e 


1  p-2t 

3  e 

2 1 


5e<  +  5e 


4  pt  —  4  — 2t  —  ip£_i_4„  —  2t  I  \  q  /  \  4  t  —  4 

3  3  3  '3  /  \  /  \  3  3 


4  pt 
3  e 


1  p-2t 

3  e 


2 1 


which  forms  the  solution  to  the  homogeneous  system  for  the  given  nonzero  initial  conditions, 
and  .  . 

-2  (t—s) 


e(t-s)  A  f  (s)  ds  = 


0 


4  pt—s  _  1  p 
3  c  3  c 


4  .-2 {t—s) 


_  i  _|_  4  g  —  2t+3s 

3  3 

_ ^  g  — 2t+3s 

3  3 


_  i  et-s  _|_  1  e-2(t-s) 
3  3 

_  1  et-s  _j_  4  e-2{t-s) 
3  3 


ds 


ds  = 


_pet  +  i(et_e-^) 

_lte*  +  4(et_e-2t) 


which  is  the  particular  solution  to  the  inhomogeneous  system  that  satisfies  the  homogeneous 
initial  conditions  u(0)  =  0.  The  solution  to  our  initial  value  problem  is  their  sum: 


u(t)  = 


-|te*+T  el- 
5 t  e*  +  ^  e*  — 


.  4  -2t 

9  e 

16  -2t 

9  e 


Exercises 


rt1  =  2it1 

10.4.47.  Solve  the  following  initial  value  problems:  (a) 


—  u 


2  ’ 


^2  —  4i^1 


2 1 


U1  =  ~u2i 


u  1  =  +  2a2  +  e  ,  4^(1)  =  1, 

(b)  .  t  ,  x  (c)  .  _  .  ,  , 

a2  =  2 16-[_  —  a2  +  e  ,  tt2(l)  =  1.  u2  ~  ^ ui  T  cost, 


a  =  3u  +  v  +  1,  4^(1)  =  1, 
v  =  4i£  +  t,  v  (1)  =  — 1. 


,  x  i>  =  p  +  q  +  t,  p{ 0) 

q  =  -p-q  +  t,  q( 0) 


3 1^2  T  6 
«l(°)  =  °> 

m2(°)  =  1. 

0, 

0. 


10.4.48.  Solve  the  following  initial  value  problems: 

u1  =  —2  a2  +  2a3,  1^(0)  =  1,  —  2  u2, 

(a)  u2  = -a1 +u2  -  2m3 +t,  u2(0)  =  0,  (b)  «2  =  -w2  +  e_t, 

«3  =  —  3mj  +  w2  ~  2m3  +  r  ^3(0)  =  0.  m3  =  4-u.j  —  4u2  —  u3, 


«i(0)  =  0, 

a2(0)  =  0. 


«i(0)  =  -1, 

m2(°)  =  0, 

w3(°)  =  -1- 
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10.4.49.  Suppose  that  A  is  not  an  eigenvalue  of  A.  Show  that  the  inhomogeneous  system 
u  =  A u  +  e^v  has  a  solution  of  the  form  u*(t)  =  eXt  w,  where  w  is  a  constant  vector. 
What  is  the  general  solution? 


10.4.50.  (a)  Write  down  an  integral  formula  for  the  solution  to  the  initial  value  problem 

^  =  Aw  +  b,  u(0)  =  0,  where  b  is  a  constant  vector. 
dt  ’  v  J 

(b)  Suppose  b  £  img  A  Do  you  recover  the  solution  you  found  in  Exercise  10.2.24? 


10.5  Dynamics  of  Structures 

Chapter  6  was  concerned  with  the  equilibrium  configurations  of  mass-spring  chains  and, 
more  generally,  structures  constructed  out  of  elastic  bars.  We  are  now  able  to  undertake 
an  analysis  of  their  dynamical  motions,  which  are  governed  by  second  order  systems  of 
ordinary  differential  equations.  The  same  systems  also  serve  to  model  the  vibrations  of 
molecules,  of  fundamental  importance  in  chemistry  and  spectroscopy,  [91].  As  in  the  first 
order  case,  the  eigenvalues  of  the  coefficient  matrix  play  an  essential  role  in  both  the  explicit 
solution  formula  and  the  system’s  qualitative  behavior (s). 

Let  us  begin  with  a  mass-spring  chain  consisting  of  n  masses  m1,...,mn  connected 
together  in  a  row  and,  possibly,  to  top  and  bottom  supports  by  springs.  As  in  Section  6.1, 
that  is,  we  restrict  our  attention  to  purely  one-dimensional  motion  of  the  masses  in  the 
direction  of  the  chain.  Thus  the  collective  motion  of  the  chain  is  prescribed  by  the  displace¬ 
ment  vector  u (t)  =  (u1(t), . . . ,  un{t))T  whose  zth  entry  represents  the  displacement  from 
equilibrium  of  the  ith  mass.  Since  we  are  now  interested  in  dynamics,  the  displacements 
are  allowed  to  depend  on  time,  t. 

The  motion  of  each  mass  is  subject  to  Newton’s  Second  Law: 

Lorce  =  Mass  x  Acceleration.  (10.62) 

The  acceleration  of  the  zth  mass  is  the  second  derivative  ui  =  d2ui/dt 2  of  its  displacement 
iq(£),  so  the  right-hand  sides  of  Newton’s  Law  is  miui.  These  form  the  entries  of  the  vector 
M  u  obtained  by  multiplying  the  acceleration  vector  by  the  diagonal,  positive  definite  mass 
matrix  M  =  diag  (m1? . . . ,  mn).  Keep  in  mind  that  the  masses  of  the  springs  are  assumed 
to  be  negligible  in  this  model. 

If,  to  begin  with,  we  assume  that  there  are  no  frictional  effects,  then  the  force  exerted 
on  each  mass  is  the  difference  between  the  external  force,  if  any,  and  the  internal  force  due 
to  the  elongations  of  its  two  connecting  springs.  According  to  (6.11),  the  internal  forces 
are  the  entries  of  the  product  K u,  where  K  =  ATC  A  is  the  stiffness  matrix,  constructed 
from  the  chain’s  (reduced)  incidence  matrix  A,  and  the  diagonal  matrix  of  spring  constants 
C .  Thus,  Newton’s  law  immediately  leads  to  the  linear  system  of  second  order  differential 
equations  72 

M  =  f  (t)  -  K u,  (10.63) 

governing  the  dynamical  motions  of  the  masses  under  a  possibly  time-dependent  exter¬ 
nal  force  f  (t).  Such  systems  are  also  used  to  model  the  undamped  dynamical  motion  of 
structures  and  molecules  as  well  as  resistanceless  (superconducting)  electrical  circuits. 

As  always,  the  first  order  of  business  is  to  analyze  the  corresponding  homogeneous 
system  ^ 

M  dt 2  +  Ku  ~  °’ 

modeling  the  unforced  motions  of  the  physical  apparatus. 


(10.64) 
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Example  10.37.  The  simplest  case  is  that  of  a  single  mass  connected  to  a  fixed  sup¬ 
port  by  a  spring.  Assuming  no  external  force,  the  dynamical  system  (10.64)  reduces  to  a 
homogeneous  second  order  scalar  equation 

d2u 


mn 


dt 2 


+  k  u  —  0, 


(10.65) 


in  which  m  >  0  is  the  mass,  while  k  >  0  is  the  spring’s  stiffness.  The  general  solution  to 
(10.65)  is 


u(t)  —  c1  cos  ujt  +  c2  sincet  =  r  cos  (uot  —  (5), 


where 


uo  —  \l  —  (10.66) 

m 


is  the  natural  frequency  of  vibration.  In  the  second  expression,  we  have  used  the  phase- 
amplitude  equation  (2.7)  to  rewrite  the  solution  as  a  single  cosine  with 


amplitude  r  = 


c?  +  cZ 


1  ^2 

and  phase  shift  S  =  tan-  —  . 

ci 


(10.67) 


Thus,  the  mass’  motion  is  periodic,  with  period  P  =  27 t/uj.  The  stiffer  the  spring  or  the 
lighter  the  mass,  the  faster  the  vibrations.  Take  note  of  the  square  root  in  the  frequency 
formula;  quadrupling  the  mass  slows  down  the  vibrations  by  only  a  factor  of  two. 

The  constants  c1:c2  —  or  their  phase-amplitude  counterparts  r,  5  —  are  determined  by 
the  initial  conditions.  Physically,  we  need  to  specify  both  an  initial  position  and  an  initial 
velocity 

u(tQ)  =  a,  u(t0)  =  b ,  (10.68) 

in  order  to  uniquely  prescribe  the  subsequent  motion  of  the  system.  The  resulting  solution 
is  most  conveniently  written  in  the  form 


u(t)  =  a  cos  uj  (t  —  t0)  -\ - sin  u(t  —  t0)  =  r  cos  u(t  —  t0)  —  5 

uj 


with  amplitude  r  —  \  a2  -\ - -  and  phase  shift  5  —  tan 


i  b 


(10.69) 


UJ' 


auj 


A  typical  solution  is  plotted  in  Figure  10.5. 


Let  us  turn  to  a  more  general  second  order  system.  To  begin  with,  let 
the  masses  are  all  the  same  and  equal  to  1  (in  some  appropriate  units), 
reduces  to 


d2  u 
dt2 


+  Ku  —  0. 


us  assume  that 
so  that  (10.64) 

(10.70) 


Inspired  by  the  form  of  the  solution  of  the  scalar  equation,  let  us  try  a  trigonometric  ansatz 
for  the  solution,  setting 

U  (t)  =  cos(a;i)  v, 


(10.71) 
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in  which  the  vibrational  frequency  uj  is  a  constant  scalar  and  v  ^  0  a  constant  vector. 
Differentiation  produces 


du 

dt 


uj  sin (ujt)  v, 


d2  u 
dt2 


uj2  cos(cct)  v,  whereas  Ku  =  cos(cct)  llv, 


since  the  cosine  factor  is  a  scalar.  Therefore,  (10.71)  will  solve  the  second  order  system 
(10.70)  if  and  only  if 

Kw  —  uj2  v.  (10.72) 


The  result  is  in  the  form  of  the  eigenvalue  equation  K\  —  Av  for  the  stiffness  matrix  K , 
with  eigenvector  0  and  eigenvalue 


A 


(10.73) 


Now,  the  scalar  equation  has  both  cosine  and  sine  solutions.  By  the  same  token,  the  ansatz 
U  (t)  =  sin(cct)  v  leads  to  the  same  eigenvector  equation  (10.72).  We  conclude  that  each 
positive  eigenvalue  leads  to  two  different  periodic  trigonometric  solutions. 

Summarizing,  we  have  established: 


Lemma  10.38.  If  v  is  an  eigenvector  of  the  positive  definite  matrix  K  with  eigenvalue 
X  —  uj  2  >  0,  then  u  (t)  =  cos  (out)  v  and  u  (t)  —  sin(c dt)  v  are  both  solutions  to  the 
homogeneous  second  order  system  ii  +  Ku  =  0. 


Stable  Structures 


Let  us  begin  with  the  motion  of  a  stable  mass-spring  chain  or  structure,  of  the  type 
introduced  in  Section  6.3.  According  to  Theorem  6.8,  stability  requires  that  the  reduced 
stiffness  matrix  be  positive  definite:  K  >  0.  Theorem  8.35  then  says  that  all  the  eigenvalues 
of  K  are  strictly  positive,  Xi  >  0,  which  is  good,  since  it  implies  that  the  vibrational 
frequencies  uui  =  are  all  real.  Moreover,  positive  definite  matrices  are  always  complete, 
and  so  K  possesses  an  orthogonal  eigenvector  basis  v1? . . . ,  vn  of  Mn  corresponding  to  its 
eigenvalues  A1? . . . ,  An,  listed  in  accordance  with  their  multiplicities.  This  yields  a  total  of 
2  n  linearly  independent  trigonometric  eigensolutions,  namely 

ui(i)  =  cos(wit)vi  =  cos(v/AT6vi,  .  ,  /in 

_  _  i  =  1, . . . ,  n,  (10. 

=  sin^)  vi  =  sin (y/\t) 

which  is  precisely  the  number  required  by  the  general  existence  and  uniqueness  theorems 
for  linear  ordinary  differential  equations.  The  general  solution  to  (10.70)  is  an  arbitrary 
linear  combination  of  the  eigensolutions: 

n  n 

u(t)  =  [cicos(^it)  +disin(uJit)]  vt  =  Y  ri  cos^t  -  5i )  v4.  (10.75) 

i  —  1  i —  1 


The  2 n  coefficients  q,  di  —  or  their  phase-amplitude  counterparts  ri  >  0  and  0  <  <  2ty 

are  uniquely  determined  by  the  initial  conditions.  As  in  (10.68),  we  need  to  specify 
both  the  initial  positions  and  initial  velocities  of  all  the  masses;  this  requires  a  total  of  2n 
initial  conditions 

u(t0)  =  a,  u(£0)  =  b.  (10.76) 

Suppose  t0  =  0;  then  substituting  the  solution  formula  (10.75)  into  the  initial  conditions, 
we  obtain 

n  n 

U(°)  =  N  CiVi  =  a’  a(°)  =  T  =  b' 

i— 1  i— 1 
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COS  t  +  COS  y/h  t 

Figure  10.6.  Periodic  and  Quasi-Periodic  Functions. 


Since  the  eigenvectors  are  orthogonal,  the  coefficients  are  immediately  found  by  our  or¬ 
thogonal  basis  formula  (4.7),  whence 


(10.77) 


The  eigensolutions  (10.74)  are  also  known  as  the  normal  modes  of  vibration  of  the  sys¬ 
tem,  and  the  ui  =  its  natural  frequencies ,  which  are  the  square  roots  of  the  eigenvalues 
of  the  stiffness  matrix  K.  Each  eigensolution  is  a  periodic,  vector- valued  function  of  pe¬ 
riod  Pi  =  27 t /ui.  Linear  combinations  of  such  periodic  functions  are  called  quasi-periodic , 
because  they  are  not ,  typically,  periodic! 

A  simple  example  is  provided  by  the  family  of  functions 

f(t)  =  cos  t  +  cos  00 1. 

If  u  —  p/q  E  Q  is  a  rational  number,  so  p,  g,  E  Z  with  q  >  o,  then  f{t)  is  a  periodic 
function,  since  f(t  +  27 vq)  =  /(£),  where  27 vq  is  the  minimal  period,  provided  that  p  and  q 
have  no  common  factors.  However,  if  uj  is  an  irrational  number,  then  /(£)  is  not  periodic. 
You  are  encouraged  to  carefully  inspect  the  graphs  in  Figure  10.6.  The  first  is  periodic  — 
can  you  spot  where  it  begins  to  repeat?  —  whereas  the  second  is  only  quasi-periodic  and 
never  quite  succeeds  in  repeating  its  behavior.  The  general  solution  (10.75)  to  a  vibrational 
system  is  similarly  quasi-periodic,  and  is  periodic  only  when  all  the  frequency  ratios 
are  rational  numbers.  To  the  uninitiated,  such  quasi-periodic  motions  may  appear  to  be 
rather  chaotic, ^  even  though  they  are  built  from  a  few  simple  periodic  constituents.  Most 
structures  and  circuits  exhibit  quasi-periodic  vibrational  motions.  Let  us  analyze  a  couple 
of  simple  examples. 

Example  10.39.  Consider  a  chain  consisting  of  two  equal  unit  masses  connected  to 
top  and  bottom  supports  by  three  springs,  as  in  Figure  10.7,  with  incidence  matrix 
/ 1  -l  o\ 

A  =  I  i  ^1.  If  the  spring  constants  are  c1,c2,c3  (labeled  from  top  to  bottom), 


This  is  not  true  chaos,  which  is  is  an  inherently  nonlinear  phenomenon,  [56]. 
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Figure  10.7.  Motion  of  a  Double  Mass-Spring  Chain  with  Fixed  Supports. 


then  the  stiffness  matrix  is 


K  =  AtCA 


~C2  \ 
C2  +  C3  / 


The  eigenvalues  and  eigenvectors  of  K  will  prescribe  the  normal  modes  and  vibrational 
frequencies  of  our  two-mass  chain. 

Let  us  look  in  detail  when  the  springs  are  identical,  and  choose  our  units  so  that 

°i  —  c2  —  =  1.  The  resulting  stiffness  matrix  K  = 

eigenvectors 


2 

1 


1 

2 


has  eigenvalues  and 


A,  =  1, 


The  general  solution  to  the  system  is  then 

1 
1 

The  first  summand  is  the  normal  mode  vibrating  at  the  relatively  slow  frequency  —  1, 
with  the  two  masses  moving  in  tandem.  The  second  normal  mode  vibrate  faster,  with 
frequency  uj2  —  a/3  —  1.73205,  in  which  the  two  masses  move  in  opposing  directions.  The 
general  motion  is  a  linear  combination  of  these  two  normal  modes.  Since  the  frequency  ratio 
cj2Am  =  \/3  is  irrational,  the  motion  is  quasi-periodic.  The  system  never  quite  returns  to 
its  initial  configuration  —  unless  it  happens  to  be  vibrating  in  one  of  the  normal  modes. 
A  graph  of  some  typical  displacements  of  the  masses  is  plotted  in  Figure  10.7. 

If  we  eliminate  the  bottom  spring,  so  the  masses  are  just  hanging  from  the  top  support 

/ 1  -l\ 

as  in  Figure  10.8,  then  the  reduced  incidence  matrix  A*  =  loses  its  last  row. 


+  r2  cos(v//3  t  —  S2 ) 


u  (t)  —  rt  cos (t  —  ^ 
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/  //  /  / 
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Figure  10.8.  Motion  of  a  Double  Mass-Spring  Chain  with  One  Free  End. 


Assuming  that  the  springs  have  unit  stiffnesses  cx  —  c2  —  1,  the  corresponding  reduced 
stiffness  matrix  is 


K *  =  (A*)  A*  = 
The  eigenvalues  and  eigenvectors  are 


★  \T  A*  /  1  — 1  \  /  1  0 


0 


1 


1  1 


2  -1 

1  1 


A,  = 


3  —  y/E 


=  I  V5  +  1 


= 


3  +  VE 


v2  = 


1 

VE-i 


2  /  \  2 
The  general  solution  to  the  system  is  the  quasi-periodic  linear  combination 


,  x  .  y/5-l 

u(t)  —  r1  cos  I  — — —  t  —  o 


1 


VE-i 


3  —  y/E  y/E  —  1 

The  slower  normal  mode,  with  frequency  uq  =  \l -  =  -  .61803,  has  the 

V  2  2 

y/E  1 

masses  moving  in  tandem,  with  the  bottom  mass  moving  proportionally - 1.61803 


farther.  The  faster  normal  mode,  with  frequency  oj2  =  ^  ^  =  ^^2  —  ~  1-61803, 

has  the  masses  moving  in  opposing  directions,  with  the  top  mass  experiencing  the  larger 
displacement.  Thus,  removing  the  bottom  support  has  caused  both  modes  to  vibrate 
slower.  A  typical  solution  is  plotted  in  Figure  10.8. 

Example  10.40.  Consider  a  three  mass-spring  chain,  with  unit  springs  and  masses, 

/  2  — 1  0 

and  both  ends  attached  to  fixed  supports.  The  stiffness  matrix  K  —  1—1  2  —1  I  has 

eigenvalues  and  eigenvectors  V  ^  ^  ^ 

X1  =  2  —  y/2, 

vi  =  I  V2  I, 


A2  —  2, 


Ao  —  2  +  V2 , 


v2  = 


1 


=  I  -V2  |. 
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The  three  normal  modes,  from  slowest  to  fastest,  have  frequencies 

(a)  co1  =  \/  2  —  y/2  :  all  three  masses  move  in  tandem,  with  the  middle  one  moving 

y/2  times  as  far. 

(b)  cj2  —  :  the  two  outer  masses  move  in  opposing  directions,  while  the 

middle  mass  does  not  move. 

(c)  cj3  =  a/2  +  \[2  :  the  two  outer  masses  move  in  tandem,  while  the  inner  mass  moves 

y/2  times  as  far  in  the  opposing  direction. 

The  general  motion  is  a  quasi-periodic  combination  of  these  three  normal  modes.  As  such, 
to  the  naked  eye  it  can  look  very  complicated.  Our  mathematical  analysis  unmasks  the 
innate  simplicity,  where  the  complex  dynamics  are,  in  fact,  entirely  governed  by  just  three 
fundamental  modes  of  vibration. 


Exercises 


10.5.1.  A  6  kilogram  mass  is  connected  to  a  spring  with  stiffness  21  kg/sec2.  Determine  the 
frequency  of  vibration  in  hertz  (cycles  per  second). 

10.5.2.  The  lowest  audible  frequency  is  about  20  hertz  =  20  cycles  per  second.  How  small  a 
mass  would  need  to  be  connected  to  a  unit  spring  to  produce  a  fast  enough  vibration  to  be 
audible?  (As  always,  we  assume  the  spring  has  negligible  mass,  which  is  probably  not  so 
reasonable  in  this  situation.) 


10.5.3.  Graph  the  following  functions.  Which  are  periodic?  quasi-periodic?  If  periodic,  what 
is  the  (minimal)  period?  (a)  sin4£  +  cos6£,  (b)  1  +  sin7r t,  (c)  cos  ^  irt  +  cos  ^  7rt, 

(d)  cos  t  +  cos  7r  t ,  (e)  sin  ^  t  +  sin  ^  t  +  sin  ^  t,  (f)  cos  t  +  cos  y/2 1  +  cos  2t,  (g)  sint  sin3£. 


p  r 

10.5.4.  What  is  the  minimal  period  of  a  function  of  the  form  cos  —  t  +  cos  -  £,  assuming  that 

q  s 

each  fraction  is  in  lowest  terms,  i.e.,  its  numerator  and  denominator  have  no  common  factors? 


10.5.5.  (a)  Determine  the  natural  frequencies  of  the  Newtonian  system 


d2  u 
dt2 


+ 


3 

2 


2 

6 


u  =  0. 


(b)  What  is  the  dimension  of  the  space  of  solutions?  Explain  your  answer. 

(c)  Write  out  the  general  solution,  (d)  For  which  initial  conditions  is  the  resulting  motion 
(i)  periodic?  (ii)  quasi-periodic?  (in)  both?  (iv)  neither?  Justify  your  answer. 


10.5.6.  Answer  Exercise  10.5.5  for  the  system 


d2u 

dt2 


+ 


73 

36 


36 

52 


u  =  0. 


10.5.7.  Find  the  general  solution  to  the  following  second  order  systems: 


(a) 


(c) 


d2u 

dt2 

d2  u 
dt2 


+ 


—  3a  +  2v, 

d2v 

2  u  —  3v. 

,  d2u 

11  u  — 

dt 2 

{b)  dt2 

— 

(1  0  0\ 

,  d2 u 

(d)  dt2 

(  —6 

4 

-i\ 

0  4  0 

u  =  0, 

— 

4 

-6 

1 

\0  0  9  J 

Ci 

1 

-11/ 

2v. 


d2v 

dt2 


=  —2  u  —  14a. 


u. 


10.5.8.  Two  masses  are  connected  by  three  springs  to  top  and  bottom  supports.  Can  you  find 
a  collection  of  spring  constants  c1,c2,c3  such  that  all  vibrations  are  periodic? 


4*  10.5.9.  Suppose  the  bottom  support  in  the  mass-spring  chain  in  Example  10.40  is  removed. 

(a)  Do  you  predict  that  the  vibration  rate  will  (i)  speed  up,  (ii)  slow  down,  or  (Hi)  stay 
the  same?  (b)  Verify  your  prediction  by  computing  the  new  vibrational  frequencies. 

(c)  Suppose  the  middle  mass  is  displaced  by  a  unit  amount  and  then  let  go.  Compute  and 
graph  the  solutions  in  both  situations.  Discuss  what  you  observe. 
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10.5.10.  Show  that  a  single  mass  that  is  connected  to  both  the  top  and  bottom  supports  by 
two  springs  of  stiffnesses  c 1 ,  c2  will  vibrate  in  the  same  manner  as  if  it  were  connected  to 
only  one  support  by  a  spring  with  the  combined  stiffness  c  =  +  c2. 

4b  10.5.11.  (a)  Describe,  quantitatively  and  qualitatively,  the  normal  modes  of  vibration  for  a 
mass-spring  chain  consisting  of  3  unit  masses,  connected  to  top  and  bottom  supports  by 
unit  springs,  (b)  Answer  the  same  question  when  the  bottom  support  is  removed. 

T  10.5.12.  Find  the  vibrational  frequencies  for  a  mass-spring  chain  with  n  identical  masses, 

connected  by  n  +  1  identical  springs  to  both  top  and  bottom  supports.  Is  there  any  sort  of 
limiting  behavior  as  n  — >  oo?  Hint :  See  Exercise  8.2.48. 

X  10.5.13.  Suppose  the  illustrated  planar  structure  has  unit  masses  at  the 

nodes  and  the  bars  are  all  of  unit  stiffness,  (a)  Write  down  the  system 
of  differential  equations  that  describes  the  dynamical  vibrations  of 
the  structure,  (b)  How  many  independent  modes  of  vibration  are 
there?  (c)  Find  numerical  values  for  the  vibrational  frequencies. 

(d)  Describe  what  happens  when  the  structure  vibrates  in  each  of 
the  normal  modes,  (e)  Suppose  the  left-hand  mass  is  displaced  a  unit 
horizontal  distance.  Determine  the  subsequent  motion. 

10.5.14.  When  does  a  homogeneous  real  first  order  linear  system  u  =  4u  have  a  quasi-periodic 
solution?  What  is  the  smallest  dimension  in  which  this  can  occur? 

X  10.5.15.  Suppose  you  are  given  n  different  springs.  In  which  order  should  you  connect  them  to 
unit  masses  so  that  the  mass-spring  chain  vibrates  the  fastest?  Does  your  answer  depend 
upon  the  relative  sizes  of  the  spring  constants?  Does  it  depend  upon  whether  the  bottom 
mass  is  attached  to  a  support  or  left  hanging  free?  First  try  the  case  of  three  springs  with 
spring  stiffnesses  cq  =  1,  c2  =  2,  c3  =  3.  Then  try  varying  the  stiffnesses.  Finally,  predict 
what  will  happen  with  4  or  5  springs,  and  see  whether  you  can  make  a  general  conjecture. 


Unstable  Structures 

So  far,  we  have  just  dealt  with  the  stable  case,  in  which  the  stiffness  matrix  K  is  positive 
definite.  Unstable  configurations,  which  can  admit  rigid  motions  and/or  mechanisms,  will 
provide  additional  complications.  The  simplest  is  a  single  mass  that  is  not  attached  to 
any  spring.  Since  the  mass  experiences  no  restraining  force,  its  motion  is  governed  by  the 
elementary  second  order  ordinary  differential  equation 

(10.78) 

The  general  solution  is 

u(t)  —  ct -\- d.  (10.79) 

If  c  =  0,  the  mass  sits  at  a  fixed  position,  while  when  c  /  0,  it  moves  along  a  straight  line 
with  constant  velocity. 

More  generally,  suppose  that  the  stiffness  matrix  K  for  our  structure  is  only  positive 
semi-definite.  Each  vector  0  ^  v  E  keriF  represents  a  mode  of  instability  of  the  sys¬ 
tem.  Since  Kw  =  0,  the  vector  v  is  a  null  eigenvector  with  associated  eigenvalue  A  =  0. 
Lemma  10.38  provides  us  with  two  solutions  to  the  dynamical  equations  (10.70)  of  “fre¬ 
quency”  uj  =  VA  =  0.  The  first,  u (?)  =  eos(ccU)  v  =  v  is  a  constant  solution,  i.e.,  an 
equilibrium  configuration  of  the  system.  Thus,  an  unstable  system  does  not  have  a  unique 
equilibrium  position,  since  every  null  eigenvector  v  E  keriF  is  a  constant  solution.  On 
the  other  hand,  the  second  solution,  u  (t)  =  sin  (cut)  v  =  0,  is  trivial,  and  so  doesn’t  help 
in  constructing  the  requisite  2  n  linearly  independent  basis  solutions.  To  find  the  missing 
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Figure  10.9.  A  Triatomic  Molecule. 


solution(s),  let  us  again  argue  in  analogy  with  the  scalar  case  (10.79),  and  try  u (t)  =  tv. 

Fortunately,  this  works,  since  u  =  v,  so  ii  =  0.  Also,  Ku  =  tKv  —  0,  and  hence  u (t)  =  tv 
solves  the  system  ii  +  Ku  —  0.  Therefore,  to  each  element  of  the  kernel  of  the  stiffness 
matrix  —  i.e.,  each  rigid  motion  and  mechanism  —  there  is  a  two-dimensional  family  of 
solutions 

u(t)  =  (ct  +  d)  v.  (10.80) 

When  c  —  0,  the  solution  u(t)  =  dv  reduces  to  a  constant  equilibrium;  when  c  /  0,  it 
is  moving  off  to  oo  with  constant  velocity  in  the  null  direction  v,  and  so  represents  an 
unstable  mode  of  the  system.  The  general  solution  will  be  a  linear  superposition  of  the 
vibrational  modes  corresponding  to  the  positive  eigenvalues  and  the  unstable  linear  motions 
corresponding  to  the  independent  null  eigenvectors. 

Remark.  If  the  null  direction  v  E  keriF  represents  a  rigid  translation,  then  the  entire 
structure  will  move  in  that  direction.  If  v  represents  an  infinitesimal  rotation,  then,  because 
our  model  is  based  on  a  linear  approximation  to  the  true  nonlinear  motions,  the  individual 
masses  will  move  along  straight  lines,  which  are  the  tangent  approximations  to  the  circular 
motion  that  occurs  in  the  true  physical,  nonlinear  regime.  We  refer  to  the  earlier  discussion 
in  Chapter  6  for  details.  Finally,  if  we  excite  a  mechanism,  then  the  masses  will  again 
follow  straight  lines,  moving  in  different  directions,  whereas  in  the  nonlinear  real  world  the 
masses  may  move  along  much  more  complicated  curved  trajectories.  For  small  motions, 
the  distinction  is  not  so  important,  while  larger  displacements,  such  as  occur  in  the  design 
of  robots,  platforms,  and  autonomous  vehicles,  [57,  75],  will  require  dealing  with  the  vastly 
more  complicated  nonlinear  dynamical  equations. 


Example  10.41.  Consider  a  system  of  three  unit  masses  connected  in  a  line  by  two  unit 

springs,  but  not  attached  to  any  fixed  supports,  as  illustrated  in  Figure  10.9.  This  chain 
could  be  viewed  as  a  simplified  model  of  an  (unbent)  triatomic  molecule  that  is  allowed 

1  0 
-1  1 

since  we  are  dealing  with  unit  springs,  the  stiffness  matrix  is 


to  move  only  in  the  vertical  direction.  The  incidence  matrix  is  A  = 


— n  "0 


617 


10.5  Dynamics  of  Structures 


The  eigenvalues  and  eigenvectors  of  K  are  easily  found: 


Ax  =  0, 


V 


1 


Each  positive  eigenvalue  provides  two  trigonometric  solutions,  while  the  zero  eigenvalue 
leads  to  solutions  that  are  constant  or  depend  linearly  on  t.  This  yields  the  required  six 
basis  solutions: 


Ui(£)  = 

u2</)  = 


COS  t 

u5(t)  =  [  —2  cos  a/3  t 

cos  a/3  t 

sin  a/3  t 
u3(t)  —  (  —  2  sin  a/3  t 

sin  a/3  t 


The  first  solution  u1(t)  is  a  constant,  equilibrium  mode,  where  the  masses  rest  at  a  fixed 
common  distance  from  their  reference  positions.  The  second  solution  u2(t)  is  the  unstable 
mode,  corresponding  to  a  uniform  rigid  translation  of  the  molecule  that  does  not  stretch 
the  interconnecting  springs.  The  final  four  solutions  represent  vibrational  modes.  In  the 
first  pair,  u 3(t),  u4(t),  the  two  outer  masses  move  in  opposing  directions,  while  the  middle 
mass  remains  fixed,  while  the  final  pair,  u5(t),u6(t)  has  the  two  outer  masses  moving  in 
tandem,  while  the  inner  mass  moves  twice  as  far  in  the  opposite  direction.  The  general 
solution  is  a  linear  combination  of  the  six  normal  modes, 


uW=c1u1(t)+  •••  +c6u6(t),  (10.81) 

and  corresponds  to  the  entire  molecule  moving  at  a  fixed  velocity  while  the  individual 
masses  perform  a  quasi-periodic  vibration. 

Let  us  see  whether  we  can  predict  the  motion  of  the  molecule  from  its  initial  conditions 


u(0)  —  a, 


u(0)  =  b, 


T 

where  a  =  (a1,a2,a3)  indicates  the  initial  displacements  of  the  three  atoms,  while  b  = 

(61,62,63)  are  their  initial  velocities.  Substituting  the  solution  formula  (10.81)  leads  to 
the  two  linear  systems 

C1  V1  +  C3  V2  +  C5  V3  =  a’  C2  v1+c4v2  +  V/ 3  Cg  V3  =  b, 

for  the  coefficients  c1? . . . ,  c6.  As  in  (10.77),  we  can  use  the  orthogonality  of  the  eigenvectors 
to  immediately  compute  the  coefficients: 


a  •  v 


ci  = 


C2 


i  cq  n2  -|-  a3 


b  •  vx  b1+b2  +  b3 


C3  = 


C4  = 


a  •  v. 


b  •  v. 


a 


a. 


a  •  v. 


a 


w 


C5  = 


C6  = 


2  CL* 


b  •  v< 


V3 


6 

b1  —  2  b2  +  b3 

6a/3 


In  particular,  the  unstable  translational  mode  is  excited  if  and  only  if  c2  ^  0,  and  this  occurs 
if  and  only  if  there  is  a  nonzero  net  initial  velocity  of  the  molecule:  b±  +  b2  +  b3  ^  0.  In  this 

case,  the  vibrating  molecule  will  run  off  to  oo  at  a  uniform  velocity  c  =  c2  =  ^(b1  +  b2  +  b3) 
equal  to  the  average  of  the  individual  initial  velocities.  On  the  other  hand,  if  b1+b2+b3  =  0, 
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then  the  atoms  will  vibrate  quasi-periodically,  with  frequencies  1  and  \/3,  around  its  fixed 
center  of  mass. 

The  observations  established  in  this  example  hold,  in  fact,  in  complete  generality.  Let 
us  state  the  result,  leaving  the  details  of  the  proof  as  an  exercise  for  the  reader. 

Theorem  10.42.  The  general  solution  to  an  unstable  second  order  linear  system 
ii  +  Ku  =  0  with  positive  semi-definite  coefficient  matrix  K  >  0  is  a  linear  combination 
of  a  quasi-periodic  or  periodic  vibrations  and  a  uniform  linear  motion  at  a  fixed  velocity 
in  the  direction  of  a  null  eigenvector  v  E  keriL.  In  particular,  the  system  will  just  vibrate 
around  a  fixed  position  if  and  only  if  the  initial  velocity  u(t0)  E  (keriL)^  =  img  K  lies  in 
the  image  of  the  coefficient  matrix. 

As  in  Chapter  6,  the  unstable  modes  v  E  keriL  correspond  to  either  rigid  motions  or 
to  mechanisms  of  the  structure.  Thus,  to  prevent  a  structure  from  exhibiting  an  unstable 
motion,  one  has  to  ensure  that  the  initial  velocity  is  orthogonal  to  all  of  the  unstable  modes. 
(The  value  of  the  initial  position  is  not  an  issue.)  This  is  the  dynamical  counterpart  of 
the  requirement  that  an  external  force  be  orthogonal  to  all  unstable  modes  in  order  to 
maintain  equilibrium  in  the  structure,  as  in  Theorem  6.8. 

Systems  with  Differing  Masses 

When  a  chain  or  structure  has  different  masses  at  the  nodes,  the  (unforced)  Newtonian 
equations  of  motion  take  the  more  general  form 

M  — —  +  /v  u  =  0.  or,  equivalently,  — —  —  —  M  1  ATi  =  —  P  u.  (10.82) 

dtz  dtz 

The  mass  matrix  M  is  always  positive  definite  (and,  almost  always,  diagonal,  although 
this  is  not  required  by  the  general  theory),  while  the  stiffness  matrix  K  —  ATC A  is  either 
positive  definite  or,  in  the  unstable  situation  when  ker  A  ^  {0},  positive  semi-definite.  The 
coefficient  matrix 

P  =  M~XK  =  M~1AtCA  (10.83) 

is  not  in  general  symmetric,  and  so  we  cannot  directly  apply  the  preceding  constructions. 
However,  P  does  have  the  more  general  self-adjoint  form  (7.85)  based  on  the  weighted 
inner  products 

( u ,  u)  =  utMu,  (( v  ,  v ))  =  vTCv,  (10.84) 

on,  respectively,  the  domain  and  codomain  of  the  (reduced)  incidence  matrix  A.  Moreover, 
in  the  stable  case  when  ker  A  =  {0},  the  matrix  P  is  positive  definite  in  the  generalized 
sense  of  Definition  7.59. 

To  solve  the  system  of  differential  equations,  we  substitute  the  same  trigonometric 
solution  ansatz  u  (t)  =  cos (ut)  v.  This  results  in  a  generalized  eigenvalue  equation 

K  v  =  A  Mv,  or,  equivalently,  P~v  —  Av,  with  A  —  uj2 .  (10.85) 

The  matrix  M  assumes  the  role  of  the  identity  matrix  in  the  standard  eigenvalue  equation 
(8.13),  and  A  is  a  generalized  eigenvalue  if  and  only  if  it  satisfies  the  generalized  character¬ 
istic  equation 

det(K-XM)  =  0.  (10.86) 

According  to  Exercise  8.5.8,  if  M  >  0  and  K  >  0,  then  all  the  generalized  eigenvalues  are 
real  and  non- negative.  Moreover  the  generalized  eigenvectors  form  an  orthogonal  basis  of 
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Mn,  but  now  with  respect  to  the  weighted  inner  product  (10.84)  prescribed  by  the  mass 
matrix  M .  The  general  solution  is  a  quasi-periodic  linear  combination  of  the  eigensolutions, 
of  the  same  form  as  in  (10.75).  In  the  unstable  case,  when  K  >  0  (but  M  necessarily 
remains  positive  definite),  one  must  include  enough  generalized  null  eigenvectors  to  span 
keriC,  each  of  which  leads  to  an  unstable  mode  of  the  form  (10.80).  Further  details  are 
relegated  to  the  exercises. 


Exercises 


10.5.16.  Find  the  general  solution  to  the  following  systems.  Distinguish  between  the 
vibrational  and  unstable  modes.  What  constraints  on  the  initial  conditions  ensure 


2„.  72 


that  the  unstable  modes  are  not  excited?  (a) 


d  u 
dt 2 


d  v 

=  -4 u  -2v,  —  =  -2 u 


( b ) 

d2  w 
dt 2 


d2u 
dt 2 


10.5.17.  Let  K  = 


such  that  K  =  Q  AQ 


s 

co 

J2 

d  v 

dt 2 

-  —3  u  —  9v 

4w. 

(d) 

d2u 
dt 2 

=  —  u  +  v  — 

(  3 

0 

-1\ 

0 

2 

0 

.  (a)  Find 

v-i 

0 

3) 

d2u 


9v ■  (c)  w 


=  —2  u  -\-  v 


2  w . 


d2v 
dt 2 


u 


v  +  2w. 


dt 2 

2  w , 

d2w 
dt 2 


v. 


d2v 


u 


dt 2 

=  —  2u  +  2v 


4  w . 


T 


d2  u 
dt2 


.  (a)  Find  an  orthogonal  matrix  Q  and  a  diagonal  matrix  A 


(b)  Is  iL  positive  definite?  (c)  Solve  the  second  order  system 

/1\  _j _  /0 


=  A u  subject  to  the  initial  conditions  u(0)  = 


0 

W 


du 

dt 


(0)  = 


(d)  Is  your  solution  periodic?  If  your  answer  is  yes,  indicate  the  period. 

(e)  Is  the  general  solution  to  the  system  periodic? 


10.5.18.  Answer  Exercise  10.5.17  when  A  = 


/ 


V 


2 

1 

0 


1 

1 

1 


0\ 

1 

2/ 


10.5.19.  Compare  the  solutions  to  the  mass-spring  system  (10.65)  with  tiny  spring  constant 

k  =  £  1  to  those  of  the  completely  unrestrained  system  (10.78).  Are  they  close?  Discuss. 

10.5.20.  Discuss  the  three-dimensional  motions  of  the  triatomic  molecule  of  Example  10.41. 

Are  the  vibrational  frequencies  the  same  as  those  of  the  one-dimensional  model? 


10.5.21.  So  far,  our  mass-spring  chains  have  been  allowed  to  move  only  in  the  vertical 
direction,  (a)  Set  up  the  system  governing  the  planar  motions  of  a  mass-spring  chain 
consisting  of  two  unit  masses  attached  to  top  and  bottom  supports  by  unit  springs,  where 
the  masses  are  allowed  to  move  in  the  longitudinal  and  transverse  directions.  Compare  the 
resulting  vibrational  frequencies  with  the  one-dimensional  case,  (b)  Repeat  the  analysis 
when  the  bottom  support  is  removed,  (c)  Can  you  make  any  conjectures  concerning  the 
planar  motions  of  general  mass-spring  chains? 

10.5.22.  Find  the  vibrational  frequencies  and  instabilities  of  the  following  structures,  assuming 
they  have  unit  masses  at  all  the  nodes.  Explain  in  detail  how  each  normal  mode  moves  the 
structure:  (a)  the  three  bar  planar  structure  in  Figure  6.13;  (b)  its  reinforced  version  in 
Figure  6.16;  (c)  the  swing  set  in  Figure  6.18. 

4b  10.5.23.  Assuming  unit  masses  at  the  nodes,  find  the  vibrational  frequencies  and  describe  the 
normal  modes  for  the  following  planar  structures.  What  initial  conditions  will  not  excite  its 
instabilities?  (a)  An  equilateral  triangle;  (b)  a  square;  (c)  a  regular  hexagon. 

4b  10.5.24.  Answer  Exercise  10.5.23  for  the  three-dimensional  motions  of  a  regular  tetrahedron. 
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? ?  10.5.25.  (a)  Show  that  if  a  structure  contains  all  unit  masses  and  bars  with  unit  stiffness, 

=  1,  then  its  frequencies  of  vibration  are  the  nonzero  singular  values  of  the  reduced 
incidence  matrix,  (b)  How  would  you  recognize  when  a  structure  is  close  to  being  unstable? 


10.5.26.  Prove  that  if  the  initial  velocity  satisfies  u(t0) 
initial  value  problem  (10.70,  76)  remains  bounded. 


b  £  coimg  A,  then  the  solution  to  the 


10.5.27.  Find  the  general  solution  to  the  system  (10.82)  for  the  following  matrix  pairs: 


(a)  M 
(c)  M 

(e)  M 


(b)  M  = 
(d)  M  = 

(f)  M  = 


0 

3 

o  o 

K  = 

(  5 
-1 

-1 

6 

0 

1 

6/ 

°\ 

l-l 
(1  2 

3 

0\ 

3 

1  ’ 

K  = 

2  8 

2 

1 

1/ 

V0  2 

1/ 

10.5.28.  A  mass-spring  chain  consists  of  two  masses,  m1  =  1  and  ra2  =  2,  connected  to  top  and 
bottom  supports  by  identical  springs  with  unit  stiffness.  The  upper  mass  is  displaced  by  a 
unit  distance.  Find  the  subsequent  motion  of  the  system. 


10.5.29.  Answer  Exercise  10.5.28  when  the  bottom  support  is  removed. 

X  10.5.30.  (a)  A  water  molecule  consists  of  two  hydrogen  atoms  connected  at  an  angle  of  105° 
to  an  oxygen  atom  whose  relative  mass  is  16  times  that  of  each  of  the  hydrogen  atoms. 

If  the  molecular  bonds  are  modeled  as  linear  unit  springs,  determine  the  fundamental 
frequencies  and  describe  the  corresponding  vibrational  modes,  (b)  Do  the  same  for  a 
carbon  tetrachloride  molecule,  in  which  the  chlorine  atoms,  with  atomic  weight  35,  are 
positioned  on  the  vertices  of  a  regular  tetrahedron  and  the  carbon  atom,  with  atomic 
weight  12,  is  at  the  center,  (c)  Finally  try  a  benzene  molecule,  consisting  of  6  carbon 
atoms  arranged  in  a  regular  hexagon.  In  this  case,  every  other  bond  is  double  strength 
because  two  electrons  are  shared.  (Ignore  the  six  extra  hydrogen  atoms  for  simplicity.) 

T  10.5.31.  Repeat  Exercise  10.5.21  for  fully  3-dimensional  motions  of  the  chain. 

4b  10.5.32.  Suppose  you  have  masses  mn1  =  1,  ra2  =  2,  ra3  =  3,  connected  to  top  and  bottom 
supports  by  identical  unit  springs.  Does  rearranging  the  order  of  the  masses  change  the 
fundamental  frequencies?  If  so,  which  order  produces  the  fastest  vibrations? 


0  10.5.33.  Suppose  M  is  a  nonsingular  matrix.  Prove  that  A  is  a  generalized  eigenvalue  of  the 

matrix  pair  K ,  M  if  and  only  if  it  is  an  ordinary  eigenvalue  of  the  matrix  P  =  M_1iF.  How 
are  the  eigenvectors  related?  How  are  the  characteristic  equations  related? 

10.5.34.  Suppose  that  u (t)  is  a  solution  to  (10.82).  Let  N  =  vM  denote  the  positive 

definite  square  root  of  the  mass  matrix  M,  as  defined  in  Exercise  8.5.27.  (a)  Prove  that 

the  “weighted”  displacement  vector  u (t)  =  N  u(t)  solves  d  u/dt  =  —  K  u,  where 

K  =  N_1  K  N_1  is  a  symmetric,  positive  semi-definite  matrix,  (b)  Explain  in  what  sense 
this  can  serve  as  an  alternative  to  the  generalized  eigenvector  solution  method. 

0  10.5.35.  Provide  the  details  of  the  proof  of  Theorem  10.42. 

0  10.5.36.  State  and  prove  the  counterpart  of  Theorem  10.42  for  the  variable  mass  system  (10.82). 


Friction  and  Damping 

We  have  not  yet  allowed  friction  to  affect  the  motion  of  our  dynamical  equations.  In  the 
standard  physical  model,  the  frictional  force  on  a  mass  in  motion  is  directly  proportional 
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Overdamped 


Figure  10.10.  Damped  Vibrations. 


to  its  velocity,  [31].  For  the  simplest  case  of  a  single  mass  attached  to  a  spring,  one  amends 
the  balance  of  forces  in  the  undamped  Newtonian  equation  (10.65)  to  obtain 


cPu  _  d/a  , 

m  — y  +  (3  — — \-ku  =  0.  (10.87) 

dtz  at 

As  before,  m  >  0  is  the  mass,  and  k  >  0  the  spring  stiffness,  while  /?  >  0  measures  the 
effect  of  a  velocity-dependent  frictional  force  —  the  larger  the  value  of  /?,  the  greater  the 
frictional  force. 

The  solution  of  this  more  general  second  order  homogeneous  linear  ordinary  differential 
equation  is  found  by  substituting  the  usual  exponential  ansatz  u(t)  =  eAt,  reducing  it  to 
the  quadratic  characteristic  equation 


mX2-\-f3X-\-k  —  0. 


(10.88) 


Assuming  that  m,  /3,  k  >  0,  there  are  three  possible  cases: 

Underdamped:  If  0  <  (3  <  2 y/m  k ,  then  (10.88)  has  two  complex-conjugate  roots: 


A  = 


P 


±  i 


\J Am  k  —  f32 


—  —  fi  d=  i  v 


(10.89) 


2  rn  2  m 

The  general  solution  to  the  differential  equation, 

(10.90) 

represents  a  damped  periodic  motion.  The  mass  continues  to  oscillate  at  a  fixed  frequency 


u(t)  —  e  (  c1  cos  vt  +  c2  sin  vt )  =  r e  Mtcos (vt  —  S): 


y/4  mk  —  /32 

Jk  p2 

2  rn 

V  m  4  m2 

(10.91) 


but  the  vibrational  amplitude  re_/it  decays  to  zero  at  an  exponential  rate  as  t  oc. 
Observe  that,  in  a  rigorous  mathematical  sense,  the  mass  never  quite  returns  to  equilibrium, 
although  in  the  real  world,  after  a  sufficiently  long  time  the  residual  vibrations  are  not 
noticeable,  and  equilibrium  is  physically  (but  not  mathematically)  achieved.  The  rate  of 
decay,  p  —  /3/(2m),  is  directly  proportional  to  the  friction,  and  inversely  proportional  to 
the  mass.  Thus,  greater  friction  and/or  less  mass  will  accelerate  the  return  to  equilibrium. 
The  friction  also  has  an  effect  on  the  vibrational  frequency  (10.91);  the  larger  /3  is,  the 
slower  the  oscillations  become  and  the  more  rapid  the  damping  effect.  As  the  friction 
approaches  the  critical  threshold  /?*  =  2 \Jm  k ,  the  vibrational  frequency  goes  to  zero, 
v  0,  and  so  the  oscillatory  period  2  tt/z^  becomes  longer  and  longer. 
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Overdamped :  If  (3  >  2 Vmk,  then  the  characteristic  equation  (10.88)  has  two  negative 
real  roots 

x  _  /?+  \//32  —  Amk  _  (3  -  y/32  -  4 mk 

Ai  — - — -  <  An  — - — -  <  U. 

2  m  2  m 

The  solution 

u(t)  —  c1  eXlt  +  c2  eAst  (10.92) 

is  a  linear  combination  of  two  decaying  exponentials.  An  overdamped  system  models  the 
motion  of,  say,  a  mass  in  a  vat  of  molasses.  Its  “vibration”  is  so  slow  that  it  can  pass 
at  most  once  through  the  equilibrium  position,  and  then  only  when  its  initial  velocity  is 
relatively  large.  In  the  long  term,  the  first  exponential  in  the  solution  (10.92)  will  go  to 
zero  faster,  and  hence,  as  long  as  c2  ^  0,  the  overall  decay  rate  of  the  solution  is  governed 
by  the  dominant  (least  negative)  eigenvalue  A2. 

Critically  Damped :  The  borderline  case  occurs  when  /3  =  =  2 VrrT/c,  which  means 

that  the  characteristic  equation  (10.88)  has  only  a  single  negative  real  root: 


A 


l 


A 

2  m 


In  this  case,  our  ansatz  supplies  only  one  exponential  solution  eXlt  =  e  ^/(2m).  A  second 
independent  solution  is  obtained  by  multiplication  by  £,  leading  to  the  general  solution 


u(t)  =  (c1i  +  c2)e-/3t/(2m). 


(10.93) 


Even  though  the  formula  looks  quite  different,  its  qualitative  behavior  is  very  similar  to 
the  overdamped  case.  The  factor  t  plays  an  unimportant  role,  since  the  asymptotics  of  this 
solution  are  almost  entirely  governed  by  the  decaying  exponential  function.  This  represents 
a  non- vibrating  solution  that  has  the  slowest  possible  decay  rate,  since  any  further  reduction 
of  the  frictional  coefficient  will  allow  a  damped,  slowly  oscillatory  vibration  to  appear. 

In  all  three  cases,  the  zero  equilibrium  solution  is  globally  asymptotically  stable.  Phys¬ 
ically,  no  matter  how  small  the  frictional  contribution,  all  solutions  to  the  unforced  system 
eventually  return  to  equilibrium  as  friction  eventually  overwhelms  the  motion. 

This  concludes  our  discussion  of  the  scalar  case.  Similar  considerations  apply  to  mass¬ 
spring  chains,  and  to  two-  and  three-dimensional  structures.  A  frictionally  damped  struc¬ 
ture  is  modeled  by  a  second  order  system  of  the  form 

d?u  du 

M  ~dT  +  B  ~dt  +  Ku  =  °’  (10'94^ 

where  the  mass  matrix  M  and  the  matrix  of  frictional  coefficients  B  are  both  diagonal  and 
positive  definite,  while  the  stiffness  matrix  K  =  ATC  A  >  0  is  a  positive  semi-definite  Gram 
matrix  constructed  from  the  (reduced)  incidence  matrix  A.  Under  these  assumptions,  it  can 
be  proved  that  the  zero  equilibrium  solution  is  globally  asymptotically  stable.  However,  the 
mathematical  details  in  this  case  are  sufficiently  intricate  that  we  shall  leave  their  analysis 
as  an  advanced  project  for  the  highly  motivated  student. 


Exercises 

10.5.37.  Consider  the  overdamped  mass-spring  equation  u  6u  +  5u  =  0.  If  the  mass  starts 
out  a  distance  1  away  from  equilibrium,  how  large  must  the  initial  velocity  be  in  order  that 
it  pass  through  equilibrium  once? 
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10.5.38.  Solve  the  following  mass-spring  initial  value  problems,  and  classify  as  to 

(i)  over  damped,  ( ii )  critically  damped,  (Hi)  underdamped,  or  (iv)  undamped: 

(a)  21  +  6?2  +  9u  =  0,  'u(O)  =  0,  u(0)  =  1.  (b)  21  +  212  +  lOu  =  0,  'u(O)  =  1,  il(0)  =  1. 

(c)  22  +  16u  =  0,  u(l)  =  0,  u(  1)  =  1.  (d)  22  +  3il  +  9i£  =  0,  n(0)  =  0,  ii(0)  =  1. 

(e)  2ii  +  3ii  +  n  =  0,  a(0)  =  2,  ii(0)  =  0.  (f)  21  +  6ii  +  lOu  =  0,  ix(0)  =  3,  ii(0)  = -2. 

10.5.39.  (a)  A  mass  weighing  16  pounds  stretches  a  spring  6.4  feet.  Assuming  no  friction, 

determine  the  equation  of  motion  and  the  natural  frequency  of  vibration  of  the  mass-spring 
system.  Use  the  value  g  =  32  ft/sec2  for  the  gravitational  acceleration,  (b)  The  mass¬ 
spring  system  is  placed  in  a  jar  of  oil,  whose  frictional  resistance  equals  the  speed  of  the 
mass.  Assume  the  spring  is  stretched  an  additional  2  feet  from  its  equilibrium  position  and 
let  go.  Determine  the  motion  of  the  mass,  (c)  Is  the  system  overdamped  or  underdamped? 
Are  the  vibrations  more  rapid  or  less  rapid  than  in  the  undamped  system? 

10.5.40.  Suppose  you  convert  the  second  order  equation  (10.87)  into  its  phase  plane  equivalent. 
What  are  the  phase  portraits  corresponding  to  (a)  undamped,  (b)  underdamped, 

(c)  critically  damped,  and  (d)  overdamped  motion? 

0  10.5.41.  (a)  Prove  that,  given  a  non-constant  solution  to  an  overdamped  mass-spring  system, 
there  is  at  most  one  time  where  u(t*)  =  0.  (b)  Is  this  statement  also  valid  in  the  critically 
damped  case? 

10.5.42.  Discuss  the  possible  behaviors  of  a  mass  moving  in  a  frictional  medium  that  is  not 
attached  to  a  spring,  i.e.,  set  k  =  0  in  (10.87). 


10.6  Forcing  and  Resonance 

Up  until  now,  our  physical  system  has  been  left  free  to  vibrate  on  its  own.  Let  us  investigate 
what  happens  when  we  shake  it.  In  this  section,  we  will  consider  the  effects  of  periodic 
external  forcing  on  both  undamped  and  damped  systems. 

The  simplest  case  is  that  of  a  single  mass  connected  to  a  spring  that  has  no  frictional 
damping.  We  append  an  external  forcing  function  f(t)  to  the  homogeneous  (unforced) 
equation  (10.65),  leading  to  the  inhomogeneous  second  order  equation 


d2u 


(10.95) 


in  which  m  >  0  is  the  mass  and  k  >  0  the  spring  stiffness.  We  are  particularly  interested 
in  the  case  of  periodic  forcing 

f(t)=acos'yt  (10.96) 

of  frequency  7  >  0  and  amplitude  a.  To  find  a  particular  solution  to  (10.95-96),  we  use 
the  method  of  undetermined  coefficients^  which  tells  us  to  guess  a  trigonometric  solution 
ansatz  of  the  form 

v*(t)  =  acos7?  +  bsinyt,  (10.97) 

where  a,  b  are  constants  to  be  determined.  Substituting  (10.97)  into  the  differential  equa¬ 
tion  produces 


2„.* 


m 


dzu 
dt 2 


+  ku*  —  a  (k  —  7T172)  cos  7 1  +  b  (k  —  r/172)  sin  7?  =  a  cos  7 1. 


t  One  can  also  use  variation  of  parameters,  although  the  intervening  calculations  are  slightly 
more  complicated. 
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We  can  solve  for 

a  a 

(1  _  _ _ _  h  —  0 

k  —  m^z  m[cjz  —  r)z) 

where 

V  m 

refers  to  the  natural,  unforced  vibrational  frequency  of  the  system.  The  solution 
valid  provided  its  denominator  is  nonzero: 

k  —  my2  =  m(u;2  —  y2)  7^  0. 


(10.98) 

(10.99) 
(10.98)  is 


Therefore,  as  long  as  the  forcing  frequency  is  not  equal  to  the  system’s  natural  frequency, 
i.e.,  y  7^  u;,  there  exists  a  particular  solution 


u*(t)  =  a  cosy t 


cosy  t 


(10.100) 


that  vibrates  at  the  same  frequency  as  the  forcing  function. 

The  general  solution  to  the  inhomogeneous  system  (10.95)  is  found,  as  usual,  by  adding 
in  an  arbitrary  solution  (10.66)  to  the  homogeneous  equation,  yielding 

a 

m{u2  —  y2 

where  r  and  5  are  determined  by  the  initial  conditions.  The  solution  is  therefore  a  quasi- 
periodic  combination  of  two  simple  periodic  motions  —  the  second,  vibrating  with  fre¬ 
quency  cj,  represents  the  internal  or  natural  vibrations  of  the  system,  while  the  first,  with 
frequency  y,  represents  the  response  to  the  periodic  forcing.  Due  to  the  factor  uj2  —  y2  in 
the  denominator  of  the  latter,  the  closer  the  forcing  frequency  is  to  the  natural  frequency, 
the  larger  the  overall  amplitude  of  the  response. 

Suppose  we  start  the  mass  at  equilibrium  at  the  initial  time  t0  =  0,  so  the  initial 
conditions  are 

u(0)  =  0,  u(  0)  =  0.  (10.102) 

Substituting  (10.101)  and  solving  for  r,  5,  we  find  that 


) 


cosy  t  +  rcos(u;t  — 5), 


(10.101) 


r  =  — 


5  =  0. 


Thus,  the  solution  to  the  initial  value  problem  can  be  written  in  the  form 


( cosy  t  —  cos  cot ) 


sm 


cn  +  y 


t 


sin 


uj  —  y 
2 


•> 


(10.103) 

where  we  have  employed  a  standard  trigonometric  identity,  cf.  Exercise  3.6.17.  The  first 
trigonometric  factor,  sin  ^(cn  +  y)  £,  represents  a  periodic  motion  at  a  frequency  equal  to  the 
average  of  the  natural  and  the  forcing  frequencies.  If  the  forcing  frequency  y  is  close  to  the 
natural  frequency  c n,  then  the  second  factor,  sin  —  y )£,  has  a  much  smaller  frequency, 
and  so  oscillates  on  a  much  longer  time  scale.  As  a  result,  it  modulates  the  amplitude  of 
the  more  rapid  vibrations,  and  is  responsible  for  the  phenomenon  of  beats ,  in  which  a  rapid 
vibration  is  subject  to  a  slowly  varying  amplitude.  An  everyday  illustration  of  beats  is  two 
tuning  forks  that  have  nearby  pitches.  When  they  vibrate  close  to  each  other,  the  sound 
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Figure  10.11.  Beats  in  a  Periodically  Forced  Vibration. 


you  hear  waxes  and  wanes  in  intensity.  As  a  mathematical  example,  Figure  10.11  displays 
the  graph  of  the  particular  function 

cos  14 1  —  cos  16 t  =  2  sin  t  sin  15 1 


on  the  interval  —  7r  <  t  <  6 tv.  The  slowly  varying  amplitude  2  sin  t  is  clearly  visible  as  the 
envelope  of  the  relatively  rapid  vibrations  at  frequency  15. 

When  we  force  the  system  at  exactly  the  natural  frequency  7  =  ce,  the  trigonometric 
ansatz  (10.97)  no  longer  works.  This  is  because  both  terms  are  now  solutions  to  the  ho¬ 
mogeneous  equation,  and  so  cannot  be  combined  to  form  a  solution  to  the  inhomogeneous 
version.  In  this  situation,  there  is  a  simple  modification  to  the  ansatz,  namely  multiplica¬ 
tion  by  £,  that  does  the  trick.  Substituting 


u*(t)  —  at  cos ujt  +  bt  sin uot 


(10.104) 


into  the  differential  equation  (10.95),  we  obtain 


rn 


d2u* 
dt 2 


T  ku* 


2  amuo  sin  ujt  +  2bmuo  cos  uot  —  a  cos  uj  £, 


provided 

ol  .  OL 

a  =  0,  b  =  - ,  and  so  u  It)  =  - t  sin  ujt. 

2muj  2  muj 

Combining  the  resulting  particular  solution  with  the  solution  to  the  homogeneous  equation 
leads  to  the  general  solution 

O' 

u(t)  —  - tsinujt  +  rcos(ujt  —  5).  (10.105) 

2  rn  uj 

Both  terms  vibrate  with  frequency  cj,  but  the  amplitude  of  the  hrst  grows  larger  and  larger 
as  t  — y  00.  As  illustrated  in  Figure  10.12,  the  mass  will  oscillate  more  and  more  wildly.  In 
this  situation,  the  system  is  said  to  be  in  resonance ,  and  the  increasingly  large  oscillations 
are  provoked  by  forcing  it  at  its  natural  frequency  uj.  In  a  physical  apparatus,  once  the 
amplitude  of  resonant  vibrations  stretches  the  spring  beyond  its  elastic  limits,  the  linear 
Hooke’s  Law  model  is  no  longer  applicable,  and  either  the  spring  breaks  or  the  system 
enters  a  nonlinear  regime. 

Furthermore,  if  we  are  very  close  to  resonance,  the  oscillations  induced  by  the  particular 
solution  (10.103)  will  have  extremely  large,  although  bounded,  amplitude.  The  lesson  is, 
never  force  a  system  at  or  close  to  its  natural  frequency  (or  frequencies)  of  vibration. 
A  classic  example  was  the  1831  collapse  of  a  bridge  when  a  British  infantry  regiment 
marched  in  unison  across  it,  apparently  inducing  a  resonant  vibration  of  the  structure.  The 
bridge  in  question  was  an  early  example  of  the  suspension  style,  similar  to  that  pictured 
in  Figure  10.13.  Learning  their  lesson,  soldiers  nowadays  no  longer  march  in  step  across 
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Figure  10.12.  Resonance. 


Figure  10.13.  The  Albert  Bridge  in  London. 


bridges  —  as  reminded  by  the  sign  in  the  photo  in  Figure  10.13.  An  even  more  dramatic 
case  is  the  1940  Tacoma  Narrows  Bridge  disaster,  when  the  vibrations  due  to  a  strong  wind 
caused  the  bridge  to  oscillate  wildly  and  break  apart!  The  collapse  was  caught  on  him, 
which  can  be  found  on  YouTube,  and  is  extremely  impressive.  The  traditional  explanation 
was  the  excitement  of  the  bridge’s  resonant  frequencies,  although  later  studies  revealed  a 
more  sophisticated  mathematical  explanation  of  the  collapse,  [22;  p.  118].  But  resonance 
is  not  exclusively  harmful.  In  a  microwave  oven,  the  electromagnetic  waves  are  tuned  to 
the  resonant  frequencies  of  water  molecules  so  as  to  excite  them  into  large  vibrations  and 
thereby  heat  up  your  dinner.  Blowing  into  a  clarinet  or  other  wind  instrument  excites  the 
resonant  frequencies  in  the  column  of  air  contained  within  it,  and  this  produces  the  musical 
sound  vibrations  that  we  hear. 

Frictional  effects  can  partially  mollify  the  extreme  behavior  near  the  resonant  frequency. 
The  frictionally  damped  vibrations  of  a  mass  on  a  spring,  when  subject  to  periodic  forcing, 
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are  described  by  the  inhomogeneous  differential  equation 


m 


+  k  u 


a  cos  yt. 


(10.106) 


Let  us  assume  that  the  friction  is  sufficiently  small  so  as  to  be  in  the  underdamped  regime 

ft  <  2 y/mk  .  Since  neither  summand  solves  the  homogeneous  system,  we  can  use  the 
trigonometric  solution  ansatz  (10.97)  to  construct  the  particular  solution 


u*(t)  = 


a 


cos  (7  t  — g),  where 


k 


(jO  = 


y/ m2(uj 2  —  y2)2  +  /3272  v  171 

continues  to  denote  the  undamped  resonant  frequency  (10.99),  while  g,  defined  by 


(10.107) 


tans  = 


Pi 


m(uj2  —  72)  ’ 


(10.108) 


represents  a  frictionally  induced  phase  lag.  Thus,  the  larger  the  friction  ft,  the  more 
pronounced  the  phase  lag  e  in  the  response  of  the  system  to  the  external  forcing.  As 

the  forcing  frequency  7  increases,  so  does  the  phase  lag,  which  attains  the  value  ^  7r  at 

the  resonant  frequency  7  =  u,  meaning  that  the  system  lags  a  quarter  period  behind  the 
forcing,  and  converges  to  its  maximum  g  —  tt  as  7  00.  Thus,  the  response  to  a  very 

high  frequency  forcing  is  almost  exactly  out  of  phase  —  the  mass  is  moving  downwards 
when  the  force  is  pulling  it  upwards,  and  vice  versa!  The  amplitude  of  the  persistent 
response  (10.107)  is  at  a  maximum  at  the  resonant  frequency  7  =  c 0,  where  it  takes  the 
value  a/(ftuS).  Thus,  the  smaller  the  frictional  coefficient  ft  (or  the  slower  the  resonant 
frequency  uj),  the  more  likely  the  breakdown  of  the  system  due  to  an  overly  large  response. 
The  general  solution  is 

Q/ 

u(t)  —  —  cos Ht  —  g)  +  r  cos(n t  —  5),  (10.109) 

y/m2(uj2  —  y2)2  +  /32  72 

where  A  =  fi  ±  i  v  are  the  roots  of  the  characteristic  equation,  while  r,  5  are  determined 
by  the  initial  conditions,  cf.  (10.89).  The  second  term  —  the  solution  to  the  homogeneous 
equation  —  is  known  as  the  transient ,  since  it  decays  exponentially  fast  to  zero.  Thus, 
at  large  times,  any  internal  motions  of  the  system  that  might  have  been  excited  by  the 
initial  conditions  die  out,  and  only  the  particular  solution  (10.107)  incited  by  the  continued 
forcing  persists. 


Exercises 

10.6.1.  Graph  the  following  functions.  Describe  the  fast  oscillatory  and  beat  frequencies: 

(a)  cos8t  — cos9t,  (b)  cos  26 1  —  cos  24 1,  (c)  cos  lOt  +  cos  9.5t,  (d)  cos  5t  —  sin  5.2t. 

10.6.2.  Solve  the  following  initial  value  problems:  (a)  u  +  36^  =  cos 3 1,  u( 0)  =  0,  u( 0)  =  0. 

(b)  u  +  6a  +  9a  =  cost,  a(0)  =  0,  a(0)  =  1.  (c)  a  +  a  +  4a  =  cos 2 1,  a(0)  =  1, 

a(0)  =  —  1.  (d)  a  +  9a  =  3  sin  3t,  a(0)  =  1,  a(0)  =  —  1.  (e)2a  +  3a  +  a  =  cos  \  t, 
a(0)  =  3,  a(0)  =  —2.  (f)  3a  +  4a  +  a  =  cos t,  a(0)  =  0,  a(0)  =  0. 

10.6.3.  Solve  the  following  initial  value  problems.  In  each  case,  graph  the  solution  and  explain 
what  type  of  motion  is  represented,  (a)  a  +  4a  +  40a  =  125  cos  5t,  a(0)  =  0,  a(0)  =  0, 

(b)  a  +  25a  =  3cos4t,  a(0)  =  1,  a(0)  =  1,  (c)  a  -f  16a  =  sin4t,  a(0)  =  0,  a(0)  =  0, 

(d)  a  +  6a  +  5a  =  25 sin 5t,  a(0)  =  4,  a(0)  =  2. 
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10.6.4.  A  mass  m  =  25  is  attached  to  a  unit  spring  with  k  =  1,  and  frictional  coefficient 

/ 3  =  .01.  The  spring  will  break  when  it  moves  more  than  1  unit.  Ignoring  the  effect  of  the 
transient,  what  is  the  maximum  allowable  amplitude  a  of  periodic  forcing  at  frequency  7  = 

(a)  .19?  (b)  .2?  (c)  .21? 

10.6.5.  (a)  For  what  range  of  frequencies  7  can  you  force  the  mass  in  Exercise  10.6.4  with 
amplitude  a  =  .5  without  breaking  the  spring?  (b)  How  large  should  the  friction  be  so  that 
you  can  safely  force  the  mass  at  any  frequency? 


10.6.6.  Suppose  the  mass-spring-oil  system  of  Exercise  10.5.39(b)  is  subject  to  a  periodic  exter¬ 
nal  force  2  cos  2 t.  Discuss,  in  as  much  detail  as  you  can,  the  long-term  motion  of  the  mass. 


dzu 


0  10.6.7.  Write  down  the  solution  u(t,  7)  to  the  initial  value  problem  m  +  ku  =  a  cos  7  £, 

dtz 


u( 0)  =  u( 0)  =  0,  for  (a)  a  non-resonant  forcing  function  at  frequency  7  /  cc; 

(b)  a  resonant  forcing  function  at  frequency  7  =  u. 

(c)  Show  that,  as  7  — )►  ca,  the  limit  of  the  non-resonant  solution  equals  the  resonant 
solution.  Conclude  that  the  solution  u(t:  7)  depends  continuously  on  the  frequency  7  even 
though  its  mathematical  formula  changes  significantly  at  resonance. 


0  10.6.8.  Justify  the  solution  formulas  (10.107)  and  (10.108). 

10.6.9.  (a)  Does  a  function  of  the  form  u(t)  =  a  cos  7 t  —  b  cos  cat  still  exhibit  beats  when  7  «  ca, 
but  a  /  &?  Use  a  computer  to  graph  some  particular  cases  and  discuss  what  you  observe, 
(b)  Explain  to  what  extent  the  conclusions  based  on  (10.103)  do  not  depend  upon  the 
choice  of  initial  conditions  (10.102). 


Electrical  Circuits 

The  Electrical-Mechanical  Correspondence  outlined  in  Section  6.2  will  continue  to  operate 
in  the  dynamical  universe.  The  equations  governing  the  equilibria  of  simple  electrical 
circuits  and  the  mechanical  systems  such  as  mass-spring  chains  and  structures  all  have  the 
same  underlying  mathematical  structure.  In  a  similar  manner,  although  they  are  based 
on  a  completely  different  set  of  physical  principles,  circuits  with  dynamical  currents  and 
voltages  are  modeled  by  second  order  linear  dynamical  systems  of  the  Newtonian  form 
presented  earlier. 

In  this  section,  we  briefly  analyze  the  very  simplest  situation:  a  single  loop  containing 
a  resistor  R ,  an  inductor  L,  and  a  capacitor  (U,  as  illustrated  in  Figure  10.14.  This  basic 
RLC  circuit  serves  as  the  prototype  for  more  general  electrical  networks  linking  various  re¬ 
sistors,  inductors,  capacitors,  batteries,  voltage  sources,  etc.  (Extending  the  mathematical 
analysis  to  more  complicated  circuits  would  make  an  excellent  in-depth  student  research 
project.)  Let  u(t )  denote  the  current  in  the  circuit  at  time  t.  We  use  vR,vL,vc  to  denote 
the  induced  voltages  in  the  three  circuit  elements;  these  are  prescribed  by  the  fundamental 
laws  of  electrical  circuitry. 

(a)  First,  as  we  learned  in  Section  6.2,  the  resistance  R  >  0  is  the  proportionality  factor 

between  voltage  and  current,  so  vR  =  Ru. 

(b)  The  voltage  passing  through  an  inductor  is  proportional  to  the  rate  of  change  in  the 

current.  Thus,  vL  =  Lii,  where  L  >  0  is  the  inductance ,  and  the  dot  indicates 

time  derivative. 

(c)  On  the  other  hand,  the  current  passing  through  a  capacitor  is  proportional  to  the  rate 

of  change  in  the  voltage,  and  so  u  =  C  vCl  where  C  >  0  denotes  the  capacitance. 

f  u(t) 

We  integrate^  this  relation  to  produce  the  capacitor  voltage  vc  =  /  — -  dt. 


The  integration  constant  is  not  important,  since  we  will  differentiate  the  resulting  equation. 
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Figure  10.14.  The  Basic  RLC  Circuit. 


The  Voltage  Balance  Law  tells  us  that  the  total  of  these  individual  voltages  must  equal 
any  externally  applied  voltage  vE  =  F(t)  coming  from,  say,  a  battery  or  generator.  There¬ 
fore, 

vr  +  vl  +  vc  ~  ve  • 

Substituting  the  preceding  formulas,  we  deduce  that  the  current  u(t)  in  our  circuit  satisfies 
the  following  linear  integro-differential  equation: 

L<R  +  Ru  +  [  £  dt  =  F(t ).  (10.110) 

at  I  O 


We  can  convert  this  into  a  differential  equation  by  differentiating  both  sides  with  respect 
to  t.  Assuming,  for  simplicity,  that  L,  i?,  and  C  are  constant,  the  result  is  the  linear  second 
order  ordinary  differential  equation 

L^+Rii+h^m-F'^  <iom> 


The  current  will  be  uniquely  specified  by  the  initial  conditions  u(t0)  =  a,u(tQ)  =  b. 

Comparing  (10.111)  with  the  equation  (10.87)  for  a  mechanically  vibrating  mass,  we  see 
that  the  correspondence  between  electrical  circuits  and  mechanical  structures  developed 
in  Chapter  6  continues  to  hold  in  the  dynamical  regime.  The  current  u  corresponds  to 
the  displacement.  The  inductance  L  plays  the  role  of  mass,  the  resistance  R  corresponds, 
as  before,  to  friction,  while  the  reciprocal  1/C  of  capacitance  is  analogous  to  the  spring 
stiffness.  Thus,  all  of  our  analytical  conclusions  regarding  stability  of  equilibria,  qualita¬ 
tive  behavior,  solution  formulas,  etc.,  that  we  established  in  the  mechanical  context  can, 
suitably  re-interpreted,  be  immediately  applied  to  electrical  circuit  theory. 


In  particular,  an  RLC  circuit  is  underdamped  if  R2  <  4 L/C ,  and  the  current  u(t) 
oscillates  with  frequency 


1 


R: 


v  = 


CL  4  L2 


(10.112) 


while  dying  off  to  zero  at  an  exponential  rate  e~Rt/(2L) .  In  the  overdamped  and  criti¬ 
cally  damped  cases  R2  >  4 L/C ,  the  resistance  in  the  circuit  is  so  large  that  the  current 
merely  decays  to  zero  at  an  exponential  rate  and  no  longer  exhibits  any  oscillatory  behav¬ 
ior.  Attaching  an  alternating  current  source  F(t)  =  a  cos  7 1  to  the  circuit  can  induce  a 
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catastrophic  resonance  if  there  is  no  resistance  and  the  forcing  frequency  is  equal  to  the 
circuit’s  natural  frequency. 


Exercises 

10.6.10.  Classify  the  following  RLC  circuits  as  (z)  underdamped,  (ii)  critically  damped,  or 

(in)  overdamped:  (a)  R  =  1,  L  =  2,  (7  =  4,  (b)  R  =  4,  L  =  3,  (7=1, 

(c)  ft  =  2,  L  =  3,  (7  =  3,  (d)  A  =  4,  L  =  10,  (7  =  2,  (e)  R  =  1,  L  =  1,  (7  =  3. 

10.6.11.  Find  the  current  in  each  of  the  unforced  RLC  circuits  in  Exercise  10.6.10  induced  by 
the  initial  data  a(0)  =  1,  a(0)  =  0. 

10.6.12.  A  circuit  with  A  =  1,  L  =  2,  (7  =  4,  includes  an  alternating  current  source 
F(t)  =  25  cos  2 1.  Find  the  solution  to  the  initial  value  problem  a(0)  =  1,  u( 0)  =  0. 

10.6.13.  A  superconducting  LC  circuit  has  no  resistance:  R  =  0.  Discuss  what  happens  when 
the  circuit  is  wired  to  an  alternating  current  source  F(t)  =  a  cos  y£. 

10.6.14.  A  circuit  with  R  =  .002,  L  =  12.5,  and  C  =  50  can  carry  a  maximum  current  of 
250.  Ignoring  the  effect  of  the  transient,  what  is  the  maximum  allowable  amplitude  a  of  an 
applied  periodic  current  F(t)  =  a  cos  yt  at  frequency  7  =  (a)  .04?  (b)  .05?  (c)  .1? 

10.6.15.  Given  the  circuit  in  Exercise  10.6.14,  over  what  range  of  frequencies  7  can  you  supply 
a  unit  amplitude  periodic  current  source? 

10.6.16.  How  large  should  the  resistance  in  the  circuit  in  Exercise  10.6.14  be  so  that  you  can 
safely  apply  any  unit  amplitude  periodic  current? 


Forcing  and  Resonance  in  Systems 


Let  us  conclude  by  briefly  discussing  the  effect  of  periodic  forcing  on  a  system  of  second 
order  ordinary  differential  equations.  Periodically  forcing  an  undamped  mass-spring  chain 
or  structure,  or  a  resistanceless  electrical  network,  leads  to  a  second  order  system  of  the 
form 


M 


d2  u 
dt2 


+  Ku 


cos(7 t)  a. 


(10.113) 


Here  M  >  0  and  K  >  0  are  nxn  matrices  as  above,  cf.  (10.82),  while  a  E  Mn  is  a  constant 
vector  representing  both  a  magnitude  and  a  “direction”  of  the  forcing  and  7  is  the  forcing 
frequency.  Superposition  is  used  to  determine  the  effect  of  several  such  forcing  functions. 
As  always,  the  solution  to  the  inhomogeneous  system  is  composed  of  one  particular  response 
to  the  external  force  combined  with  the  general  solution  to  the  homogeneous  system,  which, 
in  the  stable  case  K  >  0,  is  a  quasi-periodic  combination  of  the  normal  vibrational  modes. 

To  find  a  particular  solution  to  the  inhomogeneous  system,  let  us  try  the  trigonometric 
ansatz 

u *(t)  =  cos(7 t)  w  (10.114) 


in  which  w  is  a  constant  vector.  Substituting  into  (10.113)  leads  to  a  linear  algebraic 

system 


(K  —  fiM)  w  =  a,  where  fi  =  y2.  (10.115) 


If  the  linear  system  (10.115)  has  a  solution,  then  our  ansatz  (10.114)  is  valid,  and  we  have 
produced  a  particular  vibration  of  the  system  (10.113)  possessing  the  same  frequency  as 
the  forcing  vibration.  In  particular,  if  /r  =  y2  is  not  a  generalized  eigenvalue  of  the  matrix 
pair  AT,  M,  as  described  in  (10.85),  then  the  coefficient  matrix  K  —  fiM  is  nonsingular,  and 
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so  (10.115)  can  be  uniquely  solved  for  any  right-hand  side  a.  The  general  solution,  then, 
will  be  a  quasi-periodic  combination  of  this  particular  solution  coupled  with  the  normal 
mode  vibrations  at  the  natural,  unforced  frequencies  of  the  system. 

The  more  interesting  case  occurs  when  y2  =  fi  is  a  generalized  eigenvalue,  and  so  K—fiM 
is  singular,  its  kernel  being  equal  to  the  generalized  eigenspace  V  —  ker {K  —  fiM).  In  this 
case,  (10.115)  will  have  a  solution  w  if  and  only  if  a  lies  in  the  image  of  K  —  jiM .  According 
to  the  Fredholm  Alternative  Theorem  4.46,  the  image  is  the  orthogonal  complement  of 
the  cokernel,  which,  since  the  coefficient  matrix  is  symmetric,  is  the  same  as  the  kernel. 
Therefore,  (10.115)  will  have  a  solution  if  and  only  if  a  is  orthogonal  to  V  ,  i.e.,  a- v  =  0  for 
every  generalized  eigenvector  v  G  V^.  Thus,  one  can  force  a  system  at  a  natural  frequency 
without  inciting  resonance,  provided  that  the  “direction”  of  forcing,  as  determined  by  the 
vector  a,  is  orthogonal  —  in  the  linear  algebraic  sense  —  to  the  natural  directions  of  motion 
of  the  system,  as  governed  by  the  eigenvectors  for  that  particular  frequency. 

If  the  orthogonality  condition  is  not  satisfied,  then  the  periodic  solution  ansatz  (10.114) 
does  not  apply,  and  we  are  in  a  truly  resonant  situation.  Inspired  by  the  scalar  solution, 
let  us  try  a  resonant  solution  ansatz 

u *(t)  =  tsin(yt)  y  +  cos(yt)  w.  (10.116) 


Since 


d2iT 
dt 2 


=  — y2  tsin(yt)  y  +  cos (yt)  (2yy  -  y2  w), 


the  function  (10.116)  will  solve  the  differential  equation  (10.113)  provided 


(K  -  fiM) y  =  0. 


(K  —  fiM)  w  =  a  —  2yy, 


d  =  r 


(10.117) 


The  first  equation  requires  that  y  E  V  be  a  generalized  eigenvector  of  the  matrix  pair 
iF,  M .  Again,  the  Fredholm  Alternative  implies  that,  since  the  coefficient  matrix  K  —  fiM 
is  symmetric,  the  second  equation  will  be  solvable  for  w  if  and  only  if  a  —  2yy  is  orthogonal 
to  the  generalized  eigenspace  V  =  coker  {K  —  fiM)  =  ker  (K  —  fiM).  Thus,  the  vector  2yy 
is  required  to  be  the  orthogonal  projection  of  a  onto  the  eigenspace  V  .  With  this  choice 
of  y  and  w,  formula  (10.116)  defines  the  resonant  solution  to  the  system. 


Theorem  10.43.  An  undamped  vibrational  system  will  be  periodically  forced  into  reso¬ 
nance  if  and  only  if  the  forcing  f  =  cos(y  t)  a  is  at  a  natural  frequency  of  the  system  and 
the  direction  of  forcing  a  is  not  orthogonal  to  the  natural  direction(s)  of  motion  of  the 
system  at  that  frequency. 


Example  10.44.  Consider  the  periodically  forced  system 


d2u 

dt2 


+ 


3 

-2 


2 

3 


u  = 


cos  t 

0 


The  eigenvalues  of  the  coefficient  matrix  are  A:  =  5,  A2  =  1,  with  corresponding  orthogonal 
eigenvectors  v1=(  j],  v2  =  (  j  J .  The  internal  frequencies  are  u 1  =  =  \/5, 


^  We  can  safely  ignore  the  arbitrary  multiple  of  the  generalized  eigenvector  that  can  be  added 
to  w  as  we  only  need  find  one  particular  solution;  these  will  reappear  anyway  once  we  assemble 
the  general  solution  to  the  system. 
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u2  ~  a/^2  =  1?  and  hence  we  are  forcing  at  a  resonant  frequency.  To  obtain  the  resonant 
solution  (10.116),  we  first  note  that  a  =  ( 1,0  )T  has  orthogonal  projection  p  =  (  |  )T 

T 

onto  the  eigenline  spanned  by  v2,  and  hence  y  =  ^  p  =  ( |,  |)  .  We  can  then  solve 


(K  —  I)w 


w  =  a  —  p  = 


w 


Therefore,  the  particular  resonant  solution  is 

u*  (t)  =  ( t  sin  t)  y  +  (cos  t )  w 
The  general  solution  to  the  system  is 


2 1  sin  t  +  j  cos  t 
2 1  sin  t 


U  (t)  =  f  +r1cos(V5t-51)  ^  ^  +r2cos(t-52)  (^, 

where  the  amplitudes  rl7r2  and  phase  shifts  51?52,  are  hxed  by  the  initial  conditions. 
Eventually  the  resonant  terms  involving  t  sin  t  dominate  the  solution,  inducing  progressively 
larger  and  larger  oscillations. 


Exercises 


10.6.17.  Find  the  general  solution  to  the  following  forced  second  order  systems: 


(a) 


(c) 


(e) 


d2u 

dt2 

d2  u 
dt2 


+ 


+ 


3  0 
0  5 


7 

-2 

13 

-6 


d2  u 
dt2 


-2 

4 

-6 


+ 


u 


u 


cos  t 
0 

5  cos  2 1 
cos  2 1 


(h) 


4 

-2 


2 

3 


u 


d2u 

dt2 

(h) 


cos  t 
11  sin  2 1 


+ 

2 

0 


5 

2 


-2 

3 


u 


0 

3 

(0 


d2u 

dt2 

d2  u 
dt2 


+ 


+ 


/ 


V 


cos  \  t 
—  cos  \  t 


6 

-4 

n 

/  cos  t\ 

4 

6 

-i 

u  = 

0 

1 

-1 

ib 

\  cos  t J 

10.6.18.  (a)  Find  the  resonant  frequencies  of  a  mass-spring  chain  consisting  of  two  masses, 
m1  =  1  and  m2  =  2,  connected  to  top  and  bottom  supports  by  identical  springs  with  unit 

stiffness,  (b)  Write  down  an  explicit  forcing  function  that  will  excite  the  resonance. 

10.6.19.  Suppose  one  of  the  fixed  supports  is  removed  from  the  mass-spring  chain  of  Exercise 
10.6.18.  Does  your  forcing  function  still  excite  the  resonance?  Do  the  internal  vibrations 
of  the  masses  (i)  speed  up,  (ii)  slow  down,  or  (in)  remain  the  same?  Does  your  answer 
depend  upon  which  of  the  two  supports  is  removed? 

4»  10.6.20.  Find  the  resonant  frequencies  of  the  following  structures,  assuming  the  nodes  all  have 
unit  mass.  Then  find  a  means  of  forcing  the  structure  at  one  of  the  resonant  frequencies, 
and  yet  not  exciting  the  resonance.  Can  you  also  force  the  structure  without  exciting  any 
mechanism  or  rigid  motion?  (a)  The  square  truss  of  Exercise  6.3.5;  ( b )  the  joined  square 
truss  of  Exercise  6.3.6;  (c)  the  house  of  Exercise  6.3.8;  (d)  the  triangular  space  station 
of  Example  6.6;  (e)  the  triatomic  molecule  of  Example  10.41;  (f)  the  water  molecule  of 
Exercise  10.5.30. 
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c  d 

addition  of  scalars 

8 

A  +  B 

addition  of  matrices 

5 

V  +  w 

addition  of  vectors 

5,  76 

v  +  w 

addition  of  subspaces 

86 

f  +  9 

addition  of  functions 

79 

cd 

multiplication  of  scalars 

8 
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6 
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• 
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hyperbolic  secant  function 

270 

sign 

sign  of  permutation 

72 

sin 

sine  function 

xvii,  176 

sinh 

hyperbolic  sine  function 

176 

span 

span 

87 

supp 

support  of  a  function 

551 

monomial  sample  vector 

265 

Tk 

Chebyshev  polynomial 

233 

7 -'(n) 

space  of  trigonometric  polynomials 
of  degree  <  n 

90,  190 

T'(oo) 

space  of  all  trigonometric  polynomials 
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609 

complex  exponential  175,  180,  183,  192,  285, 
287,  390,  549 
complex  inequality  177 
complex  inner  product  184 
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continuum  physics  156 
contraction  600 

control  system  vii,  xv,  76,  99,  106,  376 
control  theory  235 
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damped  565,  498,  621,  623 
critically  622,  629 
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Daubechies  function  559-60 
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block  35,  74,  128,  171,  420,  449,  535 
main  7,  43,  52,  492 
off-  7,  32 
sub-  52,  492,  535 
super-  52,  449,  492,  535 


Subject  Index 


649 


diagonal  entry  10,  47,  70,  168,  205,  420,  445, 
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double  eigenvalue  411,  416 
double  root  411,  576 
doubly  stochastic  505 
downhill  236 
drone  200 

dual  basis  350,  352,  369 
dual  linear  function  369,  398 
dual  map  395 
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eigenfunction  183 
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525,  527,  537,  539,  560,  565,  609 
co-  416,  503,  525 

complex  413,  425,  577-8,  587,  603 
dominant  524,  529 
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434,  438,  446,  448,  480,  523,  528,  566, 
572,  618 

eigenvector  matrix  531 
Eigenvektor  408 
Eigenwert  408 
elastic 

bar  301,  322,  608 
beam  279 
body  439 

elastic  deformation  438 
elasticity  xii ,  358,  381 
electric  charge  313 

electrical  circuit  viii,  xii ,  xiv,  122,  129,  129, 
196,  235-6,  301,  628 
electrical  energy  319 
electrical  engineering  173 
Electrical-Mechanical  Correspondence  321, 
628 

electrical  network  xi,  xv,  120,  301,  311-2, 

327,  626,  630 
electrical  system  183 
electricity  xi,  311,  315,  403 
electromagnetic  wave  626 
electromagnetism  80,  173,  236,  381 
electromotive  force  311 
electron  311-3 
element  xvi 

finite  220,  235,  400,  521,  541,  547 
real  391 
unit  148 

zero  76,  79,  82,  87,  140,  342 
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536 

regular  Gaussian  14,  18,  171,  268 
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ellipsoid  363,  438-9,  465,  472 
elliptic  system  542 
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elongation  vector  302,  325 
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energy  320,  341,  583,  585 
electrical  319 
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spectral  437 
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engine 
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engineer  xiii,  xviii 

engineering  vii,  ix ,  xiii ,  1,  156,  227,  235,  301, 
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equilibrium  314 
Euler  380,  393 
Fibonacci  481,  486-7 
fixed-point  506,  509,  546,  559,  563 
Fredholm  integral  377 
functional  556 
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Haar  555,  563 
heat  394 

homogeneous  differential  84,  379,  381,  392, 
567,  609,  621 

inhomogeneous  differential  84,  606,  623, 
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inhomogeneous  iterative  479 

integral  vii,  76,  106,  183,  341-2,  376-8,  556 

integro-differential  629 
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linear  differential  vii,  viii,  xiv ,  84,  342,  376, 
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Laplace  381,  383,  385,  393 

matrix  differential  590,  592 
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system  of  ordinary  differential  ix ,  xii-xv, 
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stable  235-6,  301-2,  579,  590,  605,  615 
unstable  235-6,  301,  590 
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equilibrium  mechanics  235 
equilibrium  point  568,  579 
equilibrium  solution  301,  405,  476,  479,  488, 
493,  565,  579,  597,  622 
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equivalence  relation  87 
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equivalent  norm  150,  152 
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experimental  237,  467 
least  squares  235,  251-2,  271,  458 
maximal  261 
measurement  256,  470 
numerical  249,  523,  536 
round-off  x,  55,  199,  206,  544 
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error  function  274 
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Jacobi  Method  xii ,  475,  509-11,  513,  517, 
519-20 

Jacobi  spectral  radius  520 

Jacobian  matrix  605 

jar  623 

jagged  286 

JAVA  14 

joint  120,  322-3 

Jordan  basis  448,  450-1,  453,  480,  488,  576-7 
Jordan  Basis  Theorem  448,  450 
Jordan  block  416-7,  449-50,  453,  598 
Jordan  canonical  form  xii ,  xiii,  403,  447,  450, 
490,  525,  598 

Jordan  chain  447-8,  450-2,  488,  576,  579, 
603-4 

null  447,  451 

Jordan  chain  solution  576-7,  581 
JPEG  555 
junction  311 

K 

kernel  x,  75,  105,  107-8,  114-5,  117,  124-5, 
221,  223-4,  331,  378,  380,  384,  411, 
429,  434,  456,  463,  631 
kill  479 

Kirchhoff’s  Current  Law  313-4,  317 
Kirchhoff’s  Voltage  Law  312,  314 
Krylov  approximation  541-2 
Krylov  subspace  xx,  475,  536-7,  539-40,  546, 
549 

Krylov  vector  537,  547 
Ky  Fan  norm  466 

L 

L1  norm  145,  147,  153,  182,  274 

9 

L  Hermitian  inner  product  180 

L2  inner  product  133,  135,  182,  185,  191, 

219,  227,  232,  234,  274,  550-1,  557, 

560 

L2  norm  133,  145,  152-3,  185,  191 

o 

L  squared  error  274 
Lp  norm  145 

L°°  norm  145,  147,  152-3,  182 
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laborer  504 
Lagrange  data  284 
Lagrange  multiplier  441 
Lagrange  polynomial  262,  284 
Lagrangian  notation  viii 
Laguerre  polynomial  231,  234,  279 
Lanczos  Method  xii,  475,  539 
language  404 

Laplace  equation  381,  383,  385,  393 
Laplace  transform  376 
Laplacian  349,  354,  381,  393 
graph  xv,  301,  317-8,  462,  464 
large  188,  475 

largest  eigenvalue  441,  495,  523-5 
laser  printing  283 
lattice  505 
Law  139 

Hooke’s  303,  306,  309,  322,  327,  625 
KirchhofT’s  Current  313-4,  317 
KirchhofT’s  Voltage  312,  314 
Newton’s  565,  608 
Ohm’s  312,  319 
Voltage  Balance  312,  314,  629 
Law  of  Cosines  139 
LC  circuit  630 

LDLt  factorization  xi,  45,  167,  437,  542 
LDV  factorization  41 
permuted  42 
leading  coefficient  367 
leaf  482 

learning  vii,  235,  404,  467 
least  squares  ix,  xiii-xv ,  129,  132,  230,  235, 
255,  266,  468 
weighted  252,  256,  265 
least  squares  approximation  188,  263,  272 
least  squares  coefficient  266 
least  squares  error  235,  251-2,  271,  458 
weighted  252,  256 
least  squares  line  474 
least  squares  minimizer  183,  237 
least  squares  solution  ix,  xi,  237-8,  250-1, 
317,  403,  458 
Lebesgue  integral  135 
left  eigenvector  416,  503,  525 
left  half-plane  580-1 
left-handed  basis  103,  202,  222 
left  inverse  31,  36,  38,  356 
left  limit  xviii 
left  null  space  x,  113 
Legendre  polynomial  232,  234,  277-8 
Leibniz  rule  594 
Leibnizian  notation  viii 
length  120,  130,  323 
letter  283 
level  curve  585 


level  set  585 
license  479 
light 

speed  of  159 
stroboscopic  286 
light  cone  159-60 
light  ray  160,  235 
limit  xviii 

line  65,  83,  87-8,  237,  239,  254,  259-60,  301, 
343,  363,  370-1,  587,  615 
least  squares  474 
parallel  371 
spectral  energy  437 
stable  587,  589,  591 
tangent  341,  600 
unstable  587,  589 
line  integral  125 
line  segment  473 
linear  vii,  ix ,  2,  342 

linear  algebra  vii,  xi,  xiii ,  xv,  1,  75,  114,  126, 
183,  243,  341,  403,  506 
Fundamental  Theorem  of  114,  461 
numerical  48 

linear  algebraic  system  vii,  ix ,  341-2,  376, 

386,  506,  517,  540 
linear  analysis  *JC 

linear  approximation  324,  329,  341,  388 
linear  combination  87,  95,  101,  287,  342,  388, 
599,  618 

linear  control  system  vii ,  xv,  376 
linear  differential  equation  vii ,  viii,  xiv ,  84, 
342,  376,  378 

linear  differential  operator  xi,  317,  341-2, 

355,  376-7,  379-80,  384 
linear  dynamical  system  xii ,  565,  603 
linear  form  161 

linear  function  ix ,  xi-xiv,  xvi,  xvii,  239,  341-2, 
349-50,  352,  355,  358,  369-70,  378, 

383,  395-6,  599 
adjoint  396-7 
dual  369,  398 
inverse  355 
invertible  387 
positive  definite  398-9,  401 
real  342,  391 
self-adjoint  398-9,  436 
skew- adjoint  400 

linear  independence  75,  99,  177, 

341,  570 

linear  integral  equation  vii,  76,  106,  183, 
341-2,  376-7,  556 
linear  integral  operator  xi,  341 
linear  iteration  vii,  403,  475i 
linear  iterative  system  vii,  ix ,  xii-xv,  476, 

479,  493,  499,  500,  522,  560,  584,  605 
linear  map  341-2 
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linear  mathematics  ix 
linear  motion  588-90,  616,  618 
linear  operator  75,  156,  341-3,  347,  376,  437, 
541 

linear  polynomial  187 
linear  programming  235 
linear  superposition  vii,  ix ,  xi,  xv,  110,  222, 
235,  250,  262,  342,  378,  480,  565,  630 
linear  system  vii,  ix ,  xi,  4,  6,  20,  23,  40,  59, 
63,  67,  75,  99,  105-7,  376,  461,  475, 
541,  565,  571,  577 
adjoint  112 

compatible  ix ,  xi ,  8,  11,  62,  224 
complex  xiv,  566 
equivalent  2 
forced  565 

homogeneous  vii,  xi ,  xii,  67,  95,  99,  106, 
108,  342,  376,  378,  384,  388,  394,  409, 
571,  585 

ill-conditioned  57,  211,  461 
incompatible  xi,  62 
inconsistent  62 

inhomogeneous  vii,  xi ,  xii ,  106,  110-1,  342, 
376,  383-4,  388,  394,  565,  585,  605-6, 
630 

lower  triangular  3,  20 
singular  461 
sparse  xv,  52,  475,  536 
triangular  2,  29,  197,  542 
weak  132,  398,  540 
linear  system  of  ordinary  differential 

equations  ix ,  xii-xv,  342,  530,  566, 

584,  571,  608,  630 
second  order  xii,  618 
linear  system  operation  2,  23,  37 
linear  transformation  ix ,  xiii,  xiv ,  341-2,  358, 
403,  426,  429,  457,  554,  599 
self-adjoint  436 
linearity  ix ,  2 
linearization  341,  605 
linearly  dependent  93,  95-6,  100,  571 
linearly  independent  75,  93,  95-6, 

99,  100,  161,  185,  341,  380,  423,  448, 
570,  594,  599 

local  minimum  236,  242,  441 
localized  549 

logarithm  xvii,  258,  269,  599 
London  626 
loop  120 

low-frequency  291,  294 
lower  bidiagonal  52 

lower  triangular  xvi,  3,  16-7,  20,  39,  73,  518 
special  xvi 

strictly  xvi ,  16-8,  28,  39,  41-2,  45,  60,  85, 
168,  509,  530 


lower  unitriangular  xvi,  16-8,  20,  28,  39, 

41-3,  45,  60,  85,  168,  530 
LP  287 

LU  factorization  x,  xiv ,  xvi ,  1,  18,  20,  41,  50, 
70,  268,  501,  536,  542 
permuted  27-8,  60,  70 
Lucas  number  486 

M 

machine  learning  vii,  235,  404,  467 
magic  square  104 
magnitude  630 
main  diagonal  7,  43,  52,  492 
manifold  235,  605 
manufacturing  235 
map  342 
difference  436 
linear  341-2 
perspective  374-5 
scaling  399 
shift  415,  436 
zero  361 
Maple  14,  57 
market  475 

Markov  chain  xii ,  475,  499-502 
Markov  process  ix ,  xiv,  463-4,  563 
mass  viii,  110,  236,  301,  311,  341,  609,  615, 
621-3,  629 
center  of  439 

mass  matrix  396,  608,  616,  618,  620,  622,  630 
mass-spring  chain  xi ,  xiv,  301,  309,  317,  399, 
403,  565,  608,  610,  619,  628,  630 
mass-spring  ring  339 
Mathematica  14,  57,  409 
mathematician  xviii 
mathematics  1,  75,  227,  314,  381 
applied  vii,  ix ,  x,  1,  48,  230,  475 
financial  1 
Matlab  14,  409 

matrix  ix—xi,  xiii ,  xiv,  xvii ,  1,  3,  48,  75,  105, 
133,  223,  341,  343,  407,  445,  457,  475 
adjacency  317 
adjoint  396 
affine  372,  603 
approximating  462 
Arnoldi  540 

augmented  12,  24,  36,  60,  66-7 
banded  55 
bidiagonal  52,  536 
block  11,  35,  603 

block  diagonal  35,  74,  128,  171,  420,  449, 
535,  598 

block  upper  triangular  74,  535 
circulant  282,  436 
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coefficient  4,  6,  63,  157,  224,  235,  241,  343, 
476,  479,  484,  499,  508,  528,  531,  566, 
575,  591,  606,  608,  618,  630 
cofactor  112 
commuting  10,  601 

complete  xvi ,  403,  424-6,  428,  430-2,  444, 
450,  480,  484,  490,  493,  522,  566,  572, 
575,  603 

complex  5,  181,  212,  226,  536,  566 
complex  diagonalizable  427 
conductance  313,  317 
convergent  488-9,  495-6,  508,  517 
covariance  163,  470-1,  473 
data  462,  467,  470-1,  473 
deflated  420 
degree  317 

diagonal  7-8,  35,  41-2,  45,  85,  159,  168, 

171,  204,  304,  313,  327,  400,  408,  425-6, 
437,  439-40,  446,  453,  455,  472,  484, 
528,  530,  608,  618,  622 
diagonalizable  xvi ,  403,  424,  426,  428,  437, 
450,  520 

doubly  stochastic  505 
eigenvalue  530,  531 
eigenvector  531 

elementary  16-7,  32,  38-9,  73,  204,  360, 

362 

elementary  reflection  206,  210,  418,  532 

Fibonacci  412,  428 

friction  622 

Gauss-Seidel  514-5,  519 

generic  48 

Gram  129,  161-3,  182,  246,  255,  274,  301, 
309,  316,  327,  351,  398,  403,  439,  454, 
456,  462,  470,  543,  622,  630 
graph  Laplacian  xv,  301,  317-8,  462,  464 
Hermitian  181-2,  435,  439 
Hessenberg  535-6,  539,  542 
Hessian  242 

Hilbert  57-8,  164,  212,  276-7,  465,  516, 

548,  584 

Householder  206,  210-1,  532,  535 
idempotent  16,  109,  216,  419 
identity  7,  8,  16,  25,  31,  70,  200,  409,  588, 
593,  618 

ill-conditioned  56-7,  276-7,  461,  525 
improper  orthogonal  202,  358,  438 
incidence  122,  124,  128,  303,  312,  314,  317, 
325,  327,  462,  616,  622 
incomplete  403,  424,  480,  490,  575,  594, 

603 

indefinite  159 
inner  product  156 
interchange  25 


matrix  ( continued ) 

inverse  x,  17,  31-3,  38,  40,  44,  72,  102,  111, 
428,  457 

invertible  33,  106,  421,  439 
irregular  505 
Jacobi  509,  511,  515,  519 
Jacobian  605 

Jordan  block  416-7,  449-50,  453,  598 
linearization  605 
lower  bidiagonal  52 

lower  triangular  xvi ,  16-7,  20,  39,  73,  518 
lower  unitriangular  xvi ,  16-8,  20,  28,  39, 
41-3,  45,  60,  85,  168,  530 
mass  396,  608,  616,  618,  620,  622,  630 
negative  definite  159-60,  171,  581,  583 
negative  semi-definite  159 
nilpotent  16,  418,  453 
nonsingular  xi:  23-4,  28,  32,  39,  42,  44,  62, 
85,  99,  106,  204,  367,  380,  422,  457, 
460,  492,  599,  630 
non-square  60,  403 
non-symmetric  157 
normal  44,  446 
normalized  data  470,  473 
orthogonal  xiii ,  183,  200,  202,  205,  208, 

210,  358,  373,  413,  431,  437,  439,  444, 
446,  457,  530,  552 
orthogonal  projection  216,  440 
orthonormal  201 
pentadiagonal  516 
perfect  xvi ,  424 

permutation  25,  27-8,  32,  42,  45,  60,  71-2, 
74,  97,  204-5,  419,  430 
positive  definite  vii,  ix:  xi-xiv,  129,  156-7, 
159-61,  164-5,  167,  170-1,  181-2,  204, 
235,  241-2,  244,  246,  252,  301,  304, 
309,  313,  316,  327,  396,  398,  432,  439, 
443,  473,  528,  531,  542,  544,  581,  583, 
608,  618,  622 

positive  semi-definite  xi ,  158,  161,  182, 
244-5,  301,  316,  320,  433,  454,  470, 
473,  514,  615,  618,  622 
positive  upper  triangular  205,  529-30 
projection  216,  440 

proper  orthogonal  202-3,  205,  222,  358, 
438-9,  600 

pseudoinverse  403,  457,  467 
quadratic  coefficient  241 
rank  one  66 

real  5,  77,  425,  430,  440,  444,  446,  476, 

536,  575-6,  595 
real  diagonalizable  427,  432 
rectangular  31,  453,  457 
reduced  incidence  303-4,  316,  318,  330, 

335,  339,  622 
reduced  resistivity  316 
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matrix  ( continued ) 
reflection  206,  210,  418,  532 
regular  xvi,  13,  18,  42,  45,  52,  70,  85,  501, 
530-1,  536,  542 
regular  transition  501 
resistance  313 
resistivity  314,  316,  320 
rotation  34,  358,  414,  430 
row  echelon  59,  60,  62,  95,  115-6 
self-adjoint  399,  618 
semi-simple  xvi ,  424 
shift  436 

similar  73,  367,  418,  425-6,  428,  465,  498, 
532,  575,  598 

simultaneously  diagonalizable  428-9 
singular  23,  70,  314,  403,  409,  411-2,  597, 
631 

skew-symmetric  47,  73,  85-6,  204,  400, 
435,  439,  600-2 
SOR  475,  518-9 
sparse  48,  536,  548 
special  lower  triangular  xvi 
special  orthogonal  222 
special  upper  triangular  xvi 
square  4,  18,  23,  31,  33,  45,  403,  416,  426, 
453,  457,  495,  542,  596 
stiffness  305,  309,  320,  327,  611,  615,  618, 
622 

strictly  diagonally  dominant  281,  283, 
421-2,  475,  498,  510,  512,  516,  584 
strictly  lower  triangular  xvi ,  16-8,  28,  39, 
41-2,  45,  60,  85,  168,  509,  530 
strictly  upper  triangular  16,  85,  509 
symmetric  xi ,  xiv ,  45,  85-6,  167,  171,  183, 
208,  216,  226,  398-9,  403,  432,  434, 
437,  440-1,  446,  454,  465,  487,  532, 
537,  542,  581,  585,  631 
symmetric  tridiagonal  532 
transition  499-501,  505,  525,  528,  536,  598 
transposed  72,  112,  162,  304,  395 
tricirculant  54,  282-3,  420,  436 
tridiagonal  52,  281,  304,  419,  492,  512, 

526,  532,  535-6,  539,  542 
unipotent  16-7 
unitary  205,  212,  439,  444-6 
unitriangular  xvi ,  16-8,  20,  28,  38-9,  41-3, 
45,  60,  85,  168,  530,  543 
upper  bidiagonal  52 
upper  Hessenberg  535-6,  539,  542 
upper  triangular  xvi ,  13,  16,  23-4,  28,  39, 
70,  204-5,  210,  425,  428,  444-6,  465, 
518,  527,  530,  532,  602 
upper  unitriangular  xvi ,  16,  18,  38,  41-3, 
543 

Vandermonde  20,  74,  260,  268 
wavelet  204,  224,  552,  554 


matrix  ( continued ) 
weight  256 

weighted  Gram  163,  247 
Young  519-20,  579 
zero  7,  8,  61,  77,  361,  457,  488,  597 
matrix  addition  5,  8,  12,  43,  349 
matrix  algebra  7,  99 
matrix  arithmetic  xiii ,  1,  8,  77 
matrix  differential  equation  590,  592 
matrix  exponential  ix ,  xii,  565,  593-4,  597, 
599,  601-2,  606 

matrix  factorization  1,  18,  20,  41,  45,  50, 
171,  183,  205,  536 
matrix  logarithm  599 
matrix  multiplication  5,  8,  43,  48,  51,  95, 
106,  223-4,  343,  352,  355,  365,  397, 
403,  429,  457,  475 

matrix  norm  xi ,  xii,  xv,  153-4,  156,  460-1, 
466,  475-6,  495-7,  496-9,  510,  515, 
596 

Euclidean  460-1,  497 
Frobenius  156,  466 
oc  495-6,  499,  510,  515 
Ky  Fan  466 
natural  153-4,  495,  499 
matrix  pair  618,  630 
matrix  polynomial  11,  453 
matrix  power  475,  479,  484,  488,  502 
matrix  product  33,  72,  130 
matrix  pseudoinverse  403,  457,  467 
matrix  series  499,  596 
matrix  solution  572 
matrix  square  root  439,  465,  620 
matrix- valued  function  592,  594,  606 
max  norm  145,  151 
maximal  error  261 
maximal  rank  456 
maximization  principle  235,  443 
constrained  442-3 

maximum  xvii ,  150,  235,  240,  441,  442 
mean  10,  84,  148,  348,  467,  470,  473 
arithmetic  148 
geometric  148 
mean  zero  10,  84,  468,  470 
measure  theory  135 
measurement  254,  256,  467-70 
measurement  error  256,  470 
mechanical  force  327 
mechanical  structure  301 
mechanical  vibration  565 
mechanics  viii ,  xi ,  xii,  129,  156,  183,  236, 
342,  396,  439,  565,  627 
classical  341,  388,  583 
continuum  xi ,  235-6,  351,  399 
equilibrium  235 
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mechanics  ( continued ) 
fluid  48,  80,  173,  236,  381,  400,  565 
quantum  vii,  viii,  x,  10,  48,  129,  135,  173, 
183,  200,  202,  227,  341,  349,  355,  381, 
388,  437,  467,  583,  599 
relativistic  341,  388 
rigid  body  200,  203 
solid  236 

mechanism  301,  331,  336,  599,  616,  618 

medical  image  102,  295 

medium  ix,  183,  285 

memory  56,  513,  561 

mesh  point  279 

methane  139 

method 

Arnoldi  xii ,  475,  538 

Back  Substitution  x,  xiii,  xiv,  3,  14,  21,  24, 
41,  50,  53,  62,  208,  211,  282,  518 
Conjugate  Gradient  476,  542,  544-5,  548 
deflation  420,  526 
direct  475,  536 

Forward  Substitution  xiii ,  xiv,  3,  20,  49, 

53,  282,  518 

Full  Orthogonalization  (FOM)  xii ,  476, 

541,  546 

Gauss-Seidel  xii,  475,  512,  514,  517,  519-20 
Gaussian  Elimination  ix—xiv,  1,  14,  24,  28, 
40,  49,  56-7,  67,  69,  72,  102,  129,  167, 
208,  237,  253,  378,  407,  409,  412,  475, 
506,  508-9,  536 

Generalized  Minimal  Residual  (GMRES) 
xii ,  476,  546-7,  549 

Gram-Schmidt  ix,  xi,  xv,  183,  192,  194-5, 
198-9,  205,  208,  215,  227,  231,  249, 

266,  445,  475,  527,  529,  538 
Householder  209,  211-2,  532 
Inverse  Power  526 
iterative  vii ,  xv,  403,  475,  506,  536 
Jacobi  xii ,  475,  509-11,  513,  517,  519-20 
Lanczos  xii ,  475,  539 
nave  iterative  517 

Power  xii ,  475,  522,  524,  529,  536-7,  568 
regular  Gaussian  Elimination  14,  18,  171, 
268 

semi-direct  xii ,  475,  536,  547 
Shifted  Inverse  Power  526-7,  534,  539 
Singular  Value  Decomposition  xii ,  403, 

455,  457,  461,  473 
Strassen  51 

Successive  Over-Relaxation  xii ,  475,  517-20 
undetermined  coefficient  372,  385-6,  500, 
623 

tridiagonal  elimination  5,  42 
metric  159 
microwave  626 
Midpoint  Rule  271 


milk  486 

minimal  polynomial  453,  537 
minimization  xiii,  xv ,  129,  156,  238,  241,  309, 
546,  583 

minimization  principle  vii ,  ix,  235-6,  320, 

342,  402 

minimization  problem  xi-xiv,  255 
minimizer  183,  237,  241,  401,  545 
minimum  xvii,  150,  235-6,  240 
global  240 
local  236,  242,  441 
minimum  norm  solution  224,  458 
mining 
data  467 

Minkowski  form  375 
Minkowski  inequality  145-6 
Minkowski  metric  159 
Minkowski  space-time  160 
Minneapolis  501 
Minnesota  406,  479,  501,  546 
missile  269 
missing  data  471 
MMt  factorization  171 
mode 

normal  xii ,  565,  611 
unstable  615,  618-9 
vibrational  616 
model  620 

modeling  viii ,  235,  279,  309 
modular  arithmetic  xvii 
modulate  624 
modulus  174,  177,  489 
molasses  622 
molecule  139,  437,  608 
benzene  620 
carbon  tetrachloride  620 
triatomic  616,  619,  632 
water  620,  626,  632 
molecular  dynamics  48 
moment  330,  439 
momentum  341,  355 
money  476,  486 
monic  polynomial  227,  453 
monitor  286 

monomial  89,  94,  98,  100,  163-4,  186,  231, 
265,  271,  275 
complex  393 
sampled  265 
trigonometric  190 
monomial  polynomial  268 
monomial  sample  vector  265 
month  477 

mother  wavelet  550,  552,  555-6,  558 
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motion  388,  403,  608,  631 
circular  616 
damped  621 
dynamical  xii,  608 
infinitesimal  616 
internal  388,  627 
linear  588-90,  616,  618 
nonlinear  600 
periodic  621,  624 

rigid  301,  327,  335,  373,  599,  601,  616 
screw  xi,  341,  373,  419,  602 
movie  xii ,  200,  285,  287,  293,  341 
MP3  183 

multiple  eigenvalue  430 
multiplication  5,  48-9,  53,  79,  261,  348,  457, 
536 

complex  173,  296,  298 
matrix  5,  8,  43,  48,  51,  95,  106,  223-4,  343, 
352,  355,  365,  397,  403,  429,  457,  475 
noncommutative  6,  26,  355,  360,  364,  601 
quaternion  364 
real  296 

scalar  5,  8,  43,  76,  78,  87,  343,  349,  390 
multiplicative  property  596 
multiplicative  unit  7 
multiplicity  416-7,  424,  454,  581 
algebraic  424 
geometric  424 
multiplier 

Lagrange  441 
multipole  547 

multivariable  calculus  x,  235,  242-3,  342, 

441,  545,  582 
music  287,  626 

N 

naive  iterative  method  517 
natural  boundary  condition  280,  283-4 
natural  direction  631 
natural  frequency  565,  611,  624-5,  630-1 
natural  matrix  norm  153-4,  495,  499 
natural  vibration  614 
Nature  235,  317,  320,  483 
negative  definite  159-60,  171,  581,  583 
negative  eigenvalue  582 
negative  semi-definite  159 
network  viii ,  xv,  301,  312,  315,  317-8,  320, 
463,  499 

communication  464 

electrical  xi,  xv ,  120,  301,  311-2,  327,  626, 
630 

newton  112 

Newton  difference  polynomial  268 
Newtonian  equation  618,  621 
Newtonian  notation  viii 


Newtonian  physics  vii 
Newtonian  system  614 
Newton’s  Law  565,  608 
n- gon  —  see  polygon 
nilpotent  16,  418,  453 
node  120,  311,  313,  315,  317,  339 
ending  311 

improper  588-9,  591,  604 
stable  586,  588-9,  591 
starting  311 
terminating  311,  322 
unstable  587-9,  591 
noise  183,  293,  555 
non- analytic  function  84,  87 
non-autonomous  570,  598 
noncommutative  6,  26,  355,  360,  364,  601 
non-coplanar  88 

nonlinear  vii,  255,  341,  388,  611,  616 
nonlinear  dynamics  565,  604,  616 
nonlinear  function  324,  341 
nonlinear  iteration  53,  475 
nonlinear  motion  600 

nonlinear  system  64,  66,  342,  475,  568,  604 
nonnegative  orthant  83 
non- null  eigenvector  434,  454 
non-Pythagorean  131 
non-resonant  628 

nonsingular  xi ,  23-4,  28,  32,  39,  42,  44,  62, 
85,  99,  106,  204,  367,  380,  422,  457, 
460,  492,  599,  630 
non-square  matrix  60,  403 
non-symmetric  matrix  157 
nontrivial  solution  67,  95 
nonzero  vector  95 

norm  ix-xi ,  xiii ,  xiv ,  129,  131,  135,  137,  142, 
144,  146,  174,  188-9,  237,  245,  489, 
495,  581 

convergence  in  151 
equivalent  150,  152 

Euclidean  xiii ,  130,  142,  172,  174,  224,  236, 
250,  455,  458,  460,  468,  473,  489,  524, 
532,  538,  544,  546 
Euclidean  matrix  460-1,  497 
Frobenius  156,  466 
H 1  136 

Hermit ian  445,  489 
oo  matrix  495-6,  499,  510,  515 
oc  145,  151,  245,  255,  473,  489,  496,  514, 
524,  544 
Ky  Fan  466 

L1  145,  147,  153,  182,  274 
L2  133,  145,  152-3,  185,  191 
L°°  145,  147,  152-3,  182 
matrix  xi ,  xii,  xv ,  153-4,  156,  460-1,  466, 
475-6,  495-7,  496-9,  510,  515,  596 
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norm  ( continued ) 
max  145,  151 
minimum  224,  458 
natural  matrix  153-4,  495,  499 

1-  145,  245,  255,  466 
residual  237 
Sobolev  136 

2-  145 

weighted  131,  135,  237,  252,  468 
normal  equation  247,  251,  272,  458 
weighted  247,  252,  256,  317 
normal  matrix  44,  446 
normal  mode  xii,  565,  611 
normal  vector  217 
normalize  468,  470,  473 
normally  distributed  473 
normed  vector  space  144,  372 
north  pole  474 
notation 
dot  viii 

Lagrangian  viii 
Leibnizian  viii 
Newtonian  viii 
prime  viii 

nowhere  differentiable  561 
nuclear  reactor  406 
nucleus  437 

null  direction  159-60,  164,  166 
null  eigenvector  433-4,  615,  618-9 
null  space  x,  106 
left  x,  113 

number  viii,  3,  5,  78 
complex  xvii,  80,  129,  173 
condition  57,  460,  466 
dyadic  561,  563 
Fibonacci  481-3,  485-6 
irrational  611 
Lucas  486 

pseudo-random  464,  487 
random  295,  464 
rational  xvii,  611 
real  xvii,  78,  137,  173,  563 
spectral  condition  525,  591 
tribonacci  486 

numerical  algorithm  viii-xi,  48,  129,  183, 
199,  400,  403,  475,  536,  547 
numerical  analysis  vii,  ix,  xii ,  xiii ,  xv,  1,  75, 
78,  132,  156,  227,  230,  233,  235,  271, 
279,  317,  475 

numerical  approximation  220,  235,  403,  416, 
467 

numerical  artifact  206 
numerical  differentiation  271 
numerical  error  249,  523,  536 
numerical  integration  271,  562 


numerical  linear  algebra  48 

o 

object  259 
observable  341 
occupation  504 
octahedron  127,  149 
odd  87,  286 

off-diagonal  7,  32,  252,  420 
offspring  479,  482 
Ohm’s  Law  312,  319 
oil  623,  628 

1  norm  145,  245,  255,  466 
one-parameter  group  599-603 
open  79,  136,  146,  151 
half-  79 
Open  Rule  271 
operation 

arithmetic  48,  199,  212,  534,  536,  548 
elementary  column  72,  74 
elementary  row  12,  16,  23,  37,  60,  70,  418, 
512 

linear  system  2,  23,  37 
operator 

derivative  348,  353 
differential  xi,  317,  341-2,  355,  376-7, 
379-80,  384 

differentiation  348,  355 
integral  xi ,  341 
integration  347 
Laplacian  349,  354,  381,  393 
linear  75,  156,  341-3,  347,  376,  437,  541 
ordinary  differential  353 
partial  differential  381 
quantum  mechanical  xi 
self-adjoint  183 
Schrodinger  437 
optics  235,  375 
optimization  235,  441-2,  466 
constrained  441 
orange  juice  486 
orbit  568 

order  379,  481,  493,  567 
first  565-7,  570-2,  577,  585,  605 
higher  605 
reduction  of  379,  390 
second  xii ,  618 

stabilization  537,  540,  547,  549 
ordinary  derivative  xviii 
ordinary  differential  equation  vii,  x,  xi ,  91, 
98-9,  101,  106-7,  301,  322,  342,  376, 
379-80,  385,  390,  403-4,  407,  435,  476, 
479,  566-7,  570,  576,  579,  604,  606, 
608,  627,  630 
homogeneous  390 
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ordinary  differential  equation  ( continued ) 
inhomogeneous  606 
system  of  ix,  xii-xv,  342,  530,  566,  584, 

571,  579,  592,  608,  630 
ordinary  differential  operator  353 
orientation  122,  201,  311,  313,  317 
origin  88,  343 
ornamentation  322 
orthant  83 

orthogonal  x,  140,  184-5,  189,  213,  216,  222, 
335,  540,  549,  618,  631 
orthogonal  basis  xi,  xiii,  xv,  184,  189,  194, 

201,  214,  235,  266,  403,  435,  446,  551, 
611,  618 

orthogonal  basis  formula  189,  611 
orthogonal  complement  217-9,  221,  431,  631 
orthogonal  eigenvector  432,  436,  611 
orthogonal  function  xi ,  183,  559-60 
orthogonal  group  203 
orthogonal  matrix  xiii ,  183,  200,  202,  205, 

208,  210,  358,  373,  413,  431,  437,  439, 
444,  446,  457,  530,  552 
improper  202,  358,  438 
special  222 

proper  202-3,  205,  222,  358,  438-9,  600 
orthogonal  polynomial  xi ,  xiv ,  141,  183,  186, 
227-8,  276-7 

orthogonal  projection  xi ,  xiii ,  xv,  183,  213, 

216,  218,  223,  235,  248,  361-2,  440, 

457,  471-2,  539,  631 
orthogonal  subspace  xv,  183,  216 
orthogonal  system  552 
orthogonal  vector  xiii-xv ,  140,  185 
orthogonality  vii,  xi ,  184,  235,  287,  295,  312, 
476,  558,  562 
orthogonalization  475 
orthonormal  basis  184,  188,  194-6,  198-9, 

201,  204,  213,  235,  248,  288,  432,  437-8, 
444,  456-7,  460,  475,  528-9,  538 
orthonormal  basis  formula  188 
orthonormal  column  444,  455-6 
orthonormal  eigenvector  437 
orthonormal  matrix  201 
orthonormal  rows  455 
orthonormalize  529,  539 
orthonormality  ix,  184 
oscillation  626 
out  of  plane  627 
outer  space  301,  327 
oven  258,  626 
overdamped  622,  629 
overflow  524 

over- relaxation  xii,  475,  517-20 
oxygen  620 


p 

p  norm  145,  245 
Pade  approximation  261 
page  126,  463,  502 
Page  Rank  463,  502 
pair 

matrix  618,  630 

parabola  15,  240,  259-60,  590-1 
paraboloid  83 
parachutist  269 

parallel  65,  83,  88,  93,  137,  142,  147-8,  187, 
371,  500 

parallel  computer  513 
parallelizable  513 
parallelogram  140,  344,  361 
parameter  599 
Cayey-Klein  203 
relaxation  518 
variation  of  385,  606,  623 
part 

imaginary  173,  177,  287,  391,  394,  575-6 
real  173,  177,  287,  365,  391,  394,  575-6, 
581,  591,  603 

partial  derivative  xviii ,  242,  349 
partial  differential  equation  vii ,  ix-xii,  xv, 
xvii,  99,  101,  106,  129,  173,  200,  227, 
230,  301,  322,  342,  376,  381,  475,  536, 
542,  547,  565 

partial  differential  operator  381 
partial  pivoting  56,  62 
partial  sum  554 

particular  solution  107,  384,  606,  623-5,  630 
partitioning  463 
path  121 

PCA  ix,  xii ,  xiv,  xv,  255,  403,  467,  471 
peak  90 
peg  279 
pendulum  236 
pentadiagonal  matrix  516 
pentagon  125,  288,  467 
perfect  matrix  xvi ,  424 
perfect  square  166 
period  611,  621 
period  2  solution  495 
periodic  172,  285,  565,  587 
periodic  boundary  condition  280,  283-4 
periodic  force  xii ,  565,  623-4,  626,  630-1 
periodic  function  86,  611 
periodic  motion  621,  624 
periodic  spline  280,  283 
periodic  vibration  618 
permutation  26-7,  45,  56,  72,  428 
column  418 
identity  26,  415 
inverse  32,  34 


Subject  Index 


667 


permutation  ( continued ) 
row  428 
sign  of  72 

permutation  matrix  25,  27-8,  32,  42,  45,  60, 
71-2,  74,  97,  204-5,  419,  430 
permuted  LDV  factorization  42 
permuted  LU  factorization  27-8,  60,  70 
perpendicular  140,  202,  255 
Perron-Frobenius  Theorem  501 
perspective  map  374-5 
perturbation  523,  525,  591 
phase  xviii,  174,  627 
phase-amplitude  89,  587,  610 
phase  lag  627 

phase  plane  567-8,  576,  586,  605 
phase  portrait  xii ,  568,  586-90,  623 
phase  shift  89,  273,  587,  609 
phenomenon 
Gibbs  562 
physical  model  620 

physics  vii,  ix,  x,  xii,  1,  200,  202,  227,  235, 
301,  314,  327,  341-2,  381,  402,  407, 
437,  499,  565 
continuum  156 
Newtonian  vii 
statistical  464 
piecewise  constant  551 
piecewise  cubic  263,  269,  279-80 
pivot  12,  14,  18,  22-3,  28,  41,  49,  56,  59,  61, 
70,  114,  167,  412 
pivot  column  56 
pivoting  23,  55,  57,  62 
full  57 

partial  56,  62 
pixel  470 
planar  graph  125 
planar  system  565,  585 
planar  vector  field  81,  86 
plane  64,  82,  88,  250,  259,  358 
complex  173,  420,  580 
coordinate  362 
diagonal  362 
left  half-  580-1 
out  of  627 

phase  567-8,  576,  586,  605 
plane  curve  86 
planet  259,  341 
plant  504 
platform  616 
Platonic  solid  127 
plot 

scatter  469,  471,  478 
point  65,  83,  87-8,  235,  250,  570,  587 
boundary  503 

closest  xi ,  183,  235,  238,  245-6,  298 


point  ( continued ) 
critical  240,  242 

data  xi ,  235,  237,  254-5,  272,  283,  467, 
470,  474 
dyadic  561-2 
equilibrium  568,  579 
fixed  493,  506,  509,  546,  559,  563 
floating  48,  58 
inflection  240 
mesh  279 
saddle  587,  589 
sample  79,  105,  285 
singular  xv,  380 
pointer  56 

Poisson  equation  385,  390,  521 
polar  angle  174 
polar  coordinate  90,  136,  174 
polar  decomposition  439 
polarization  160 
pole  151,  474 
police  196,  254 
polygon  125,  208,  288,  467 
polynomial  xi,  xiv,  75,  78,  83,  89,  91,  94,  98, 
100,  114,  139,  219,  260-1,  413,  440, 
578,  581,  603 
approximating  266,  279 
characteristic  408,  415,  453,  475 
Chebyshev  233 
complete  monomial  268 
constant  78 
cubic  260,  267,  280 
elementary  symmetric  417 
even  86 
factored  415 
harmonic  381,  393 
Hermite  233 

interpolating  260,  262,  271,  279 
Lagrange  262,  284 
Laguerre  231,  234,  279 
Legendre  232,  234,  277-8 
linear  187 
matrix  11,  453 
minimal  453,  537 
monic  227,  453 
Newton  difference  268 
orthogonal  xi,  xiv,  141,  183,  186,  227-8, 
276-7 

piecewise  cubic  263,  269,  279-80 
quadratic  xvi,  167,  185,  190,  235,  240,  260, 
267 

quartic  221,  264,  269,  276 
quintic  233 
radial  277 
sampled  265 
symmetric  417 
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Taylor  269,  324,  383 

trigonometric  75,  90-1,  94,  176,  190,  273 

unit  148 

zero  78 

polynomial  algebra  78 
polynomial  equation  416 
polynomial  growth  580,  604 
polynomial  interpolation  235,  260,  262,  271, 
279 

polytope  149 

population  259,  475,  479,  482,  487,  504 
portrait 

phase  xii ,  568,  586-90,  623 
position  254,  355 
general  375 
initial  609-10,  612 

positive  definite  vii,  ix,  xi-xiv ,  129,  156-7, 

159-61,  164-5,  167,  170-1,  181-2,  204, 
235,  241-2,  244,  246,  252,  301,  304, 

309,  313,  316,  327,  396,  398,  432,  439, 
443,  473,  528,  531,  542,  544,  581,  583, 
608,  618,  622 

positive  semi-definite  xi,  158,  161,  182,  244-5, 
301,  316,  320,  433,  454,  470,  473,  514, 
615,  618,  622 

positive  upper  triangular  205,  529-30 
positivity  130,  133,  144,  146,  156 
potential 
gravitational  311 
voltage  311-2,  315,  318,  320 
potential  energy  235-6,  244,  309,  320,  583 
potential  theory  173 
power  235,  319-20 
matrix  475,  479,  484,  488,  502 
power  ansatz  380,  479 
power  function  320 

Power  Method  xii,  475,  522,  524,  529,  536-7, 
568 

Inverse  526 

Shifted  Inverse  526-7,  534,  539 
power  series  175 
precision  48,  57,  461 
predator  479 
prestressed  320 
price  258 

prime  notation  viii 
primitive  root  of  unity  288 
principal  axis  465,  472,  487 
Principal  Component  Analysis  ix,  xii ,  xiv ,  xv, 
255,  403,  467,  471-2 
Fundamental  Theorem  of  472 
principal  coordinate  472,  477 
principal  direction  471-4 
principal  standard  deviation  472 
principal  stretch  438-9 


principal  variance  472-3 
principle 

maximization  235,  442-3 
minimization  vii,  ix ,  235-6,  320,  342,  402 
optimization  235,  441-2,  466 
Reality  391,  484 

superposition  75,  106,  111,  378,  388 
Uncertainty  355 
printing  283 
probabilistic  process  479 
probability  vii,  viii ,  xii-xiv ,  463,  475,  499 
transitional  501-2 
probability  distribution  468 
probability  eigenvector  501-3 
probability  vector  473,  500-1 
problem  viii 

boundary  value  vii,  x,  xi,  xv,  54,  75,  92,  99, 
136,  183,  222,  235,  322,  342,  376-6, 
386,  389,  397,  399,  541-2 
initial  value  376,  386,  570,  594,  598,  606 
minimization  xi-xiv ,  255 
process  viii,  xii ,  xiv,  403,  475,  499 
Gram-Schmidt  ix ,  xi ,  xv,  183,  192,  194-5, 
198,  205,  208,  215,  227,  231,  249,  266, 
445,  475,  527,  529,  538 
Markov  ix ,  xiv,  463-4,  563 
probabilistic  479 
stable  Gram-Schmidt  199,  538 
stochastic  499 
processing 

image  vii,  viii ,  x-xii,  xv ,  1,  48,  99,  102,  183, 
188,  404,  467,  470,  555 
signal  vii,  viii ,  x-xiii,  1,  75,  80,  99,  102, 

129,  183,  188,  235,  272,  293,  476 
video  48,  102,  188,  200,  285,  294-5,  341 
processor  56,  513 
product  xvii,  256 
Cartesian  81,  86,  133,  347,  377 
complex  inner  184 
cross  140,  187,  239,  305,  602 
dot  xiii,  129,  137,  146,  162,  176,  178,  193, 
201,  265,  351,  365,  396,  431-3,  455, 

550 

H1  inner  136,  144,  233 
Hermitian  dot  178,  205,  288,  433,  444,  446 
Hermitian  inner  180-1,  192,  435 
inner  ix—xi,  xiii ,  xiv,  129-30,  133,  137,  144, 
156-7,  163,  179,  237,  245,  347,  350, 

395 

L2  inner  133,  135,  180,  182,  185,  191,  219, 
227,  232,  234,  274,  550-1,  557,  560 
matrix  33,  72,  130 
real  inner  184,  395,  401 
Sobolev  inner  136,  144,  233 
vector  6,  130 
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product  ( continued ) 
weighted  inner  131,  135,  182,  246,  265, 

309,  396,  435,  543,  618 
product  inequality  154 
professional  504 
profit  258 
programming 
computer  14,  28 
linear  235 

projection  341,  353,  374,  467 
orthogonal  xi,  xiii,  xv,  183,  213,  216,  218, 
223,  235,  248,  361-2,  440,  457,  471-2, 
539,  631 
random  471 

projection  matrix  216,  440 
proof  viii 

proper  isometry  373,  419 
proper  orthogonal  matrix  202-3,  205,  222, 
358,  438-9,  600 
proper  subspace  210 
proper  value  408 
proper  vector  408 
property 

multiplicative  596 
pseudo-random  number  464,  487 
pseudocode  14,  24,  31,  49,  56,  206,  212,  536, 
544 

pseudoinverse  403,  457,  467 
Pythagorean  formula  188,  460 
Pythagorean  Theorem  130-2 

Q 

Q.E.D.  xvii 

QR  algorithm  xii ,  200,  475,  527,  529,  531-2, 
535-6,  538 

QR  factorization  xi ,  205,  210,  522,  529,  539 
quadrant  81 

quadratic  coefficient  matrix  241 
quadratic  eigenvalue  493 
quadratic  equation  64,  166,  621 
quadratic  form  xi ,  86,  157,  161,  166-7,  170, 
241,  245,  346,  437,  440-2,  583 
indefinite  159 
negative  definite  159-60 
negative  semi-definite  159 
positive  definite  157,  160 
positive  semi-definite  158 
quadratic  formula  166 
quadratic  function  xi ,  xiii,  235,  239-41,  259, 
274,  401,  545,  582-3 
quadratic  minimization  problem  xi-xiv 
quadratic  polynomial  xvi ,  167,  185,  190,  235, 
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inconsistent  62 

iterative  53,  475,  481,  488,  492-3,  563 
large  475 

linear  vii,  ix,  xi ,  4,  6,  20,  23,  40,  59,  63,  67, 
75,  99,  105-7,  376,  461,  475,  541,  565, 
571,  577 
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linear  algebraic  vii,  ix ,  341-2,  376,  386, 
506,  517,  540 
lower  triangular  3,  20 
Newtonian  614 
non-autonomous  570,  598 
nonlinear  64,  66,  342,  475,  568,  604 
order  of  481,  493 
orthogonal  552 
planar  565,  585 
singular  461 
sparse  xv,  52,  475,  536 
stochastic  341 

triangular  2,  20,  29,  197,  542 
trivial  590-1 
two-dimensional  585 
undamped  623,  630-1 
unforced  622 

weak  132,  398,  540-1,  546 
system  of  ordinary  differential  equations 

ix ,  xii-xv,  342,  530,  566,  584,  571,  579, 
592,  608,  630 
second  order  xii,  618 

T 

Tacoma  Narrows  Bridge  626 
tangent  xvii,  341,  600 
target  xvi,  342 
taxi  501 

Taylor  polynomial  269,  324,  383 
Taylor  series  87,  91 
technology  555 
temperature  258 
tension  327 
tensor  4 
inertia  439 
terminal  317 

terminating  node  311,  322 
test 

statistical  467 

tetrahedron  127,  139,  321,  619-20 
theater  293 
theorem  viii 

Cayley-Hamilton  420,  453 
Center  Manifold  604 
Fundamental,  of  Algebra  98,  124,  415 
Fundamental,  of  Calculus  347,  356,  606 
Fundamental,  of  Linear  Algebra  114,  461 
Fundamental,  of  Principal  Component 
Analysis  472 

Gershgorin  Circle  420,  475,  503 
Jordan  Basis  448,  450 
Perron-Frobenius  501 
Pythagorean  130-2 
Rolle’s  231 
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Spectral  437,  439,  456,  529 
Stability  577,  603 
Weierstrass  Approximation  220 
theory  viii,  xiii 
category  viii 
control  235 
function  135 
graph  viii,  x,  xiv,  12 
group  xii,  464,  599 
measure  135 
potential  173 

thermodynamics  183,  236,  381,  403 
thin  220 

three-dimensional  space  82-3,  88,  99,  200, 
335,  373 

time  viii ,  475-6,  568 
initial  621 

space-  159-60,  235,  341,  358 
time  reversal  569 
topology  120,  146,  151,  312 
total  variance  472 
tower  322 

trace  10,  85,  170-1,  415,  417,  586,  590-1, 

596,  600 

trajectory  568,  600,  602,  616 
transform 
Cayley  204 

discrete  Fourier  xi,  xv,  183,  272,  285,  289, 
295 

fast  Fourier  xi,  235,  296 
Fourier  376,  559 
integral  376 
Laplace  376 
wavelet  transform  554 
transformation  342,  358 
affine  ix,  xii ,  xiv,  341,  370-3,  377,  419,  603 
identity  348,  429 

linear  ix,  xiii ,  xiv,  341-2,  358,  403,  426, 

429,  457,  554,  599 
scaling  429 
self-adjoint  436 
shearing  361 
transient  627 

transition  matrix  4499-501,  505,  525,  528, 
536,  598 
regular  501 

transitional  probability  501-2 
translation  xi,  328,  331,  341,  346,  370-2,  419, 
550-1,  556-8,  561-2,  601,  616 
transmission  272,  294 

transpose  x,  xii ,  43-5,  72,  112,  114,  162,  200, 
222,  304,  314,  342,  351,  357,  369,  395, 
397,  399,  416,  422,  456,  502,  597 
Hermitian  205,  444 


trapezoid  126 
Trapezoid  Rule  271,  562 
travel  company  258 
traveling  salesman  505 
tree  127,  322,  467 
tribonacci  number  486 
triangle  125,  363,  371,  467,  505,  632 
equilateral  328,  500,  619 
triangle  inequality  129,  142-4,  146,  154,  179, 
498 
triangular 

block  upper  74,  535 
lower  xvi,  3,  16-7,  20,  39,  73,  518 
positive  upper  205,  529-30 
special  xvi 

strictly  lower  xvi,  16-8,  28,  39,  41-2,  45, 
60,  85,  168,  509,  530 
strictly  upper  16,  85,  509 
upper  xvi,  13,  16,  20,  23-4,  28,  39,  49,  70, 
204-5,  210,  425,  428,  444-6,  465,  518, 
527,  530,  532,  602 
triangular  form  2,  14 
triangular  matrix 

lower  xvi,  16-7,  20,  39,  73,  518 
upper  xvi,  13,  16,  23-4,  28,  39,  70,  204-5, 
210,  425,  428,  444-6,  465,  518,  527, 
530,  532,  602 

triangular  system  2,  20,  29,  197,  542 
triangularize  444 
triatomic  molecule  616,  619,  632 
tricirculant  matrix  54,  282-3,  420,  436 
tridiagonal  matrix  52,  281,  304,  419,  492, 
512,  526,  532,  535-6,  539,  542 
tridiagonal  solution  algorithm  52,  282 
tridiagonalization  532,  535 
trigonometric  ansatz  609,  618,  623,  625,  630 
trigonometric  approximation  271,  273 
trigonometric  function  xi ,  xiv,  xvii ,  89,  164, 
175-6,  183,  235,  272,  292,  578,  580-1 
complex  176-7 
trigonometric  identity  175 
trigonometric  integral  175,  177,  624 
trigonometric  interpolation  86,  235,  287,  293 
trigonometric  monomial  190 
trigonometric  polynomial  75,  90-1,  94,  176, 
190,  273 

trigonometric  series  549 
trigonometric  solution  381 
trivial  solution  67 
trivial  subspace  82,  429 
trivial  system  590-1 
truss  322,  615,  632 
tuning  fork  624 
turkey  258 

two-dimensional  system  585 
2-norm  145 
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typography  283 

u 

unbiased  468 
unbounded  581,  586,  603 
Uncertainty  Principle  355 
uncorrelated  469,  472 
undamped  623,  630-1 
underdamped  621,  627,  629 
underflow  524 
undergraduate  9 
under-relaxed  518 
underwater  vehicle  200 
undetermined  coefficients  372,  385-6,  500, 
623 

unforced  622 
uniform  convergence  562 
uniform  distribution  468 
union  xvii,  86 
unipotent  16-7 

uniqueness  1,  23,  40,  380,  383-4,  401,  479, 
568,  570,  593,  610 
unit  76 
additive  7 
imaginary  xvii,  173 
multiplicative  7 
unit  ball  85,  149-50,  473 
unit  circle  xvii,  132,  288,  442,  530 
unit  cross  polytope  149 
unit  diamond  149 
unit  disk  136,  371,  503 
unit  eigenvector  471,  493,  496 
unit  element  148 
unit  function  148 
unit  octahedron  149 
unit  polynomial  148 
unit  radius  viii 
unit  scalar  8 

unit  sphere  83,  149-50,  363,  375,  438,  465, 
473-4 

unit  square  136 

unit  vector  141,  148,  150,  184,  208,  325,  327, 
441,  443,  524,  532,  576,  602 
unitary  matrix  205,  212,  439,  444-6 
United  States  259 
unitriangular  xvi ,  16 
lower  xvi ,  16-8,  20,  28,  39,  41-3,  45,  60, 
85,  168,  530 

upper  xvi ,  16,  18,  38,  41-3,  543 
unity 

root  of  288,  292,  296 
universe  vii,  160,  358,  403,  628 
unknown  4,  6,  506 

unstable  314,  405,  478,  581,  583-4,  586,  591, 
619 


unstable  eigensolution  603 
unstable  equilibrium  235-6,  301,  590 
unstable  focus  588-9,  591 
unstable  improper  node  588-9,  591 
unstable  line  587,  589 
unstable  manifold  605 
unstable  mode  615,  618-9 
unstable  node  587-9,  591 
unstable  solution  viii,  405,  581 
unstable  star  589-90 
unstable  structure  301,  615 
unstable  subspace  604 
upper  Hessenberg  matrix  535-6,  539,  542 
upper  triangular  xvi,  13,  16,  23-4,  28,  39,  70 
block  74,  535 
positive  205,  529-30 
special  xvi 
strictly  16,  85,  509 
upper  triangular  system  20,  49 
upper  unitriangular  xvi ,  16,  18,  38,  41-3,  543 
uranium  404 

v 

valley  236 
value 

absolute  xvii 

boundary  vii,  x,  xi,  xv,  54,  75,  92,  99,  136, 
183,  222,  235,  322,  342,  376-6,  386, 
389,  397,  399,  541-2 
characteristic  408 
expected  468 

initial  376,  386,  570,  594,  598,  606 
proper  408 

sample  79,  256,  260,  272,  286,  479 
singular  vii ,  ix,  xii,  xiv ,  403,  454-7,  460-2, 
466-7,  472-3,  497 

Vandermonde  matrix  20,  74,  260,  268 
variable  viii,  2,  62,  605 
basic  62-3,  118 
change  of  172,  232,  234 
complex  172 

free  62,  63,  67,  96,  108,  119-20,  315 
phase  plane  567 
separation  of  227 
variance  468-71,  473 
principal  472-3 
total  472 
unbiased  468 
variation  xii,  235 

variation  of  parameters  385,  606,  623 
vat  622 

vector  ix,  x,  xiii ,  xiv ,  xvii,  1,  75,  129,  223, 

341,  457,  480,  571,  578 
Arnoldi  538-40,  542,  547 
battery  317 
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vector  ( continued ) 
characteristic  408 
circuit  312 

column  4,  6,  46,  48,  77,  130,  350-1 
complex  129,  433 
constant  588 

current  source  314,  318,  320 
data  254 

displacement  302,  312,  325,  620 
elongation  302,  325 
error  254,  508,  514 
force  327 

gradient  349,  545,  582 
Gram-Schmidt  194 
Householder  210-1,  536 
image  473 
initial  475,  540 
Krylov  537,  547 
measurement  468,  470 
mechanism  336 
nonzero  95 
normal  217 
normalized  468,  470 
orthogonal  xiii-xv ,  140,  185 
parallel  142,  147 
probability  473,  500-1 
proper  408 
real  76,  391 

residual  237,  522,  541,  544-5,  548 
row  4,  6,  130,  350-1 

sample  80,  98,  105,  183,  265,  285-7,  436, 
554 

singular  454,  461,  467,  473 
standard  basis  36,  99,  111,  184,  261,  343, 
349,  356,  426,  449,  450,  529 
state  499 

unit  141,  148,  150,  184,  208,  325,  327,  441, 
443,  524,  532,  576,  602 
velocity  574 
voltage  312,  320 

zero  7,  67,  77,  82,  93,  131,  407,  493 
vector  addition  77,  82,  390 
vector  calculus  353,  365 
vector  field  81,  574 
vector  product  6,  130 
vector  space  x,  xi,  xiii ,  xvii,  75-6,  82,  130, 
341,  349,  600 

complex  xi ,  xiv,  76,  129,  177,  179,  287,  342, 
390 

conjugated  390 

finite-dimensional  xiii ,  xiv,  101,  149,  356 
high-dimensional  467 
infinite-dimensional  x,  xiv,  xv,  101,  129, 

133,  149,  151,  213,  219-20,  274,  301, 
341,  349,  351,  355-6,  396,  401,  541 
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isomorphic  356 
normed  144,  372 
quotient  87,  105,  357 
real  xi ,  76,  342 

vector- valued  function  80,  98,  136,  341,  605 
vector-valued  solution  592 
vehicle  200,  254,  616 
velocity  viii ,  80,  254,  259,  365,  615,  618, 
620-1 

initial  609-10,  618,  622 
velocity  vector  field  574 
vertex  120,  122,  125-6,  311,  463,  502,  505 
ending  122 
starting  121-2 

vibration  viii ,  ix,  xii ,  399,  565,  608-9,  611, 
621-2,  625-6,  629-31 
internal  565,  624 
mechanical  565 
natural  614 
periodic  618 

quasi-periodic  565,  617-8 
resonant  565,  625 
vibrational  force  630 
vibrational  frequency  610,  621,  624 
vibrational  mode  616 
video  48,  102,  188,  200,  285,  294-5,  341 
vision 

computer  375,  499 
voltage  121,  311-2,  319-20,  626-7 
Voltage  Balance  Law  312,  314,  629 
voltage  potential  311-2,  315,  318,  320 
voltage  vector  312,  320 
Volterra  integral  equation  378 
volume  465 

w 

walk 

random  463 
wall  322 
waste  258,  406 
water  583 

water  molecule  620,  626,  632 
wave 

electromagnetic  626 
wave  function  173,  341 
wavelet  xii,  xiv,  xv,  102,  227,  476,  549,  552, 
555,  560 

Daubechies  555,  562 
daughter  550,  556,  563 
Haar  549-50,  552-3,  555,  562 
mother  550,  552,  555-6,  558 
wavelet  basis  102,  189,  204,  283,  550,  552, 
555-6,  562 

wavelet  coefficient  470 
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wavelet  matrix  204,  224,  552,  554 
wavelet  series  553,  562 
wavelet  transform  554 
weak  formulation  132,  398,  540-1,  546 
weather  48,  407,  499,  501 
web  page  126,  463,  502 
Weierstrass  Approximation  Theorem  220 
weight  132,  135,  256,  502 
atomic  620 
weight  matrix  256 
weighted  adjoint  397 
weighted  angle  138 
weighted  digraph  311,  502 
weighted  Gram  matrix  163,  247 
weighted  graph  311 

weighted  inner  product  131,  135,  182,  246, 
265,  309,  396,  435,  543,  618 
weighted  integral  182 
weighted  least  squares  252,  256,  265 
weighted  norm  131,  135,  237,  252,  468 
weighted  normal  equation  247,  252,  256,  317 
wheel  287 
white  collar  504 
wind  323 

wind  instrument  626 

wire  120,  301,  311-3,  317,  319,  327 

withdrawal  476 

Wronskian  98 


Y 

year  475-6 

Young  matrix  519-20,  579 
YouTube  626 


z 

zero  xvi 

mean  10,  84,  468,  470 
zero  column  59 
zero  correlation  470 
zero  determinant  70 
zero  eigenspace  434 

zero  eigenvalue  412,  421,  433-4,  581,  615 
zero  element  76,  79,  82,  87,  140,  342 
zero  entry  52,  59,  449,  501 
zero  function  79,  83,  134,  343,  405 
zero  map  361 

zero  matrix  7,  8,  61,  77,  361,  457,  488,  597 

zero  polynomial  78 

zero  potential  315 

zero  row  70,  73 

zero  scalar  8 

zero  solution  67,  117,  383-4,  405,  410,  476, 
478,  489-90,  492-3,  581-2 
zero  subspace  82,  429 
zero  vector  7,  67,  77,  82,  93,  131,  407,  493 


